
Rasiah Loganantharaj Gunther Palm 
MoonisAli (Eds.) 



Intelligent 
Problem Solving 



Methodologies and Approaches 

13th International Conference on Industrial 

and Engineering Applications of Artificial Intelligence 

and Expert Systems, lEA/AIE 2000 

New Orleans, Louisiana, USA, June 2000 

Proceedings 





Lecture Notes in Artificial Intelligence 1821 

Subseries of Lecture Notes in Computer Science 
Edited by J. G. Carbonell and J. Siekmann 

Lecture Notes in Computer Science 

Edited by G. Goos, J. Hartmanis and J. van Eeeuwen 




Berlin 

Heidelberg 

New York 

Barcelona 

Hong Kong 

London 

Milan 

Paris 

Singapore 

Tokyo 




Rasiah Logananthara Gunther Palm 
Moonis Ali (Eds.) 



Methodologies and Approaches 



13th International Conference on Industrial 

and Engineering Applications of Artificial Intelligence 

and Expert Systems, lEA/AIE 2000 

New Orleans, Louisiana, USA, June 19-22, 2000 

Proceedings 




Series Editors 

Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USA 
Jorg Siekmann, University of Saarland, Saarbriicken, Germany 



Volume Editors 
Rasiah Eogananthara 

University of Eousiana, The Center for Advanced Computer Studies 
2 Rex Street, Eafayette, EA 70504-4330, USA 
E-mail: logan@cacs.usl.edu 

Gunther Palm 

University of Ulm, Department of Neural Information Processing 
Oberer Eselsberg, 89069 Ulm, Germany 
E-mail: palm@neuro.informatik.uni-ulm.de 

Moonis Ali 

Southwest Texas State University, Department of Computer Science 
601 University Drive, San Marcos, TX 78666-4616, USA 
E-mail: ma04@swt.edu 



Cataloging-in-Publication data applied for 

Die Deutsche Bibliothek - CIP-Einheitsaufnahme 

Intelligent problem solving : methodologies and approaches ; 
proceedings / 13th International Conference on Industrial and 
Engineering Applications of Artificial Intelligence and Expert 
Systems, lEA/AIE 2000 New Orleans, Eouisiana, USA, 

June 19-22, 2000. Rasiah Eogananthara . . . (ed.). - Berlin ; 
Heidelberg ; New York ; Barcelona ; Hong Kong ; Eondon ; Milan ; 
Paris ; Singapore ; Tokyo : Springer, 2000 
(Eecture notes in computer science ; Vol. 1821 : Eecture notes in 
artificial intelligence) 

ISBN 3-540-67689-9 



CR Subject Classification (1998): 1.2 

ISBN 3-540-67689-9 Springer- Verlag Berlin Heidelberg New York 



This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, 
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, 
in its current version, and permission for use must always be obtained from Springer- Verlag. Violations are 
liable for prosecution under the German Copyright Law. 

(c) Springer-Verlag Berlin Heidelberg 2000 
Printed in Germany 

Typesetting: Camera-ready by author 

Printed on acid-free paper SPIN 10721072 06/3142 5 4 3 2 1 0 




Preface 



The focus of the papers presented in these proceedings is on employing various 
methodologies and approaches for solving real-life problems. Although the 
mechanisms that the human brain employs to solve problems are not yet completely 
known, we do have good insight into the functional processing performed by the 
human mind. On the basis of the understanding of these natural processes, scientists 
in the field of applied intelligence have developed multiple types of artificial 
processes, and have employed them successfully in solving real-life problems. The 
types of approaches used to solve problems are dependant on both the nature of the 
problem and the expected outcome. While knowledge-based systems are useful for 
solving problems in well-understood domains with relatively stable environments, the 
approach may fail when the domain knowledge is either not very well understood or 
changing rapidly. The techniques of data discovery through data mining will help to 
alleviate some problems faced by knowledge -based approaches to solving problems 
in such domains. 

Research and development in the area of artificial intelligence are influenced by 
opportunity, needs, and the availability of resources. The rapid advancement of 
Internet technology and the trend of increasing bandwidths provide an opportunity 
and a need for intelligent information processing, thus creating an excellent 
opportunity for agent-based computations and learning. Over 40% of the papers 
appearing in the conference proceedings focus on the area of machine learning and 
intelligent agents - clear evidence of growing interest in this area. There are still many 
interesting theoretical problems and applications in other areas of artificial 
intelligence. The proceedings cover several interesting applications, and some 
theoretical concerns of intelligent systems. 

Although the contributions in these proceedings report methodologies and approaches 
in solving specific problems, we believe generalization of the processes implemented 
in these papers will, in the future, lead to more general problem solving techniques for 
intelligent systems in specific problem classes. 

The papers included in these proceedings were presented at IEA/AIE-2000, the 
Thirteenth International Conference on Industrial and Engineering Applications of 
Artificial Intelligence and Expert Systems, held June 19-22, 2000 in New Orleans, 
Louisiana, USA. The conference was sponsored by the International Society of 
Applied Intelligence, Southwest Texas State University, and the University of 
Louisiana at Lafayette, in cooperation with ACM/SIGART, the American Association 
for Artificial Intelligence, the Canadian Society for Computational Studies of 
Intelligence CSCSI/SCEIO, the Institution of Electrical Engineers, the International 
Neural Network Society, and the Japanese Society of Artificial Intelligence. Over 
120 high-quality papers were submitted to this conference. After a thorough review 
by at least two referees per paper, the program committee selected 90 papers. 




VI Preface 



As editors of these proceedings, we are pleased to present final versions of the 
accepted papers, revised to incorporate referee comments. These contributions 
address broad topics, including agents, distributed problem solving, artificial neural 
networks, data mining, machine learning, diagnosis, expert systems, information 
systems, genetic algorithms, fuzzy logic, design, natural language processing, pattern 
recognition, and combinatorial optimization problems. About 19% of the papers in 
the proceedings address agents, their technology, and approaches to solving problems 
using agents. Another 22% of the papers discuss machine learning, knowledge 
discovery and data mining, and the application of artificial neural networks. 
Approximately 19% of the papers discuss diagnosis and the application of expert 
systems. The other major focuses of the papers are in the area of information systems 
and soft computing, with a total strength of close to 13%. There are about four papers 
in each of the remaining areas. These areas include fuzzy logic and its applications, 
design, logic, natural language processing, pattern recognition, and combinatorial 
optimization. 

We would like to express our sincere gratitude to the members of the program 
committee, the reviewers, the session chairs, and the organizing committee. 
Specifically, we would like to thank Stanislav Kurkovsky for accepting the 
responsibility of maintaining the online paper collection site, in addition to being the 
local arrangements chair for the conference. We also would like to thank the graduate 
students, Bushrod Thomas and Ryan Benton, for cheerfully helping us to complete 
several laborious tasks. I would like to thank my daughter, Nisha Loganantharaj, for 
helping me in crosschecking the authors index and table of contents. 

We would like to express our sincere thanks to all those researchers and developers 
who submitted papers, without whom we would not have had an intellectually 
stimulating conference. Finally, we would like to thank all the auxiliary reviewers 
who happily reviewed the papers and alleviated some of the strain on program 
committee members. 



May 2000 



Rasiah Loganantharaj 
Gunther Palm 
Moonis Ali 




The 13*** International Conference on Industrial and 
Engineering Applications of Artificial Intelligence and 
Expert Systems IEA/AIE-2000 

New Orleans, Louisiana, USA, June 19 - 22, 2000 



Sponsored by: 

International Society of Applied Intelligence 
Organized in Cooperation with 
AAAI, ACM/SIGART, CSCSI, INNS, JSAI, lEE, SWT, ULL 



Organizing Committee 

General Chair: Moonis Ali, Southwest Texas State University, USA 
Program Chair: Rasiah Loganantharaj, University of Louisiana, Lafayette, USA 
Program Co-chair: Gunther Palm, University of Ulm, Germany 
Track Chair: Don Potter, Univeristy of Georgia, USA 
Tutorial Chair: Anthony S. Maida, University of Louisiana, Lafayette, USA 
Workshop Chair: Debasis Mitra, Jackson State University, USA 
Publicity Chair: Khosrow Kaikhah, Southwest Texas State University, USA 
Exhibition Chair: Srini Ramaswamy, Tennessee Tech. University, USA 
Local Arrangement Chair: Stanislav Kurkovsky, Columbus State University, USA 
Registration Chair: Cheryl Morriss, Southwest Texas State University, USA 

Program Committee 

Frank D. Anger, National Science Foundation, USA 
F. Belli, University of Paderbom, Germany 
Mark Boddy, Honeywell, USA 
John Bresena, NASA Ames, USA 
Steve Chien, JPL, USA 

Angel P. del Pobil, Universidad Jaume-I, Spain 
Tara Estlin, JPL, USA 
Graham Forsyth, MRL, Australia 
S. Fukuda, Tokyo Metropolitan Institute of Technology, Japan 
M. Girolami, University of Paisley, United Kingdom 
Hans W. Guesgen, Auckland University, New Zealand 
Gopal Gupta, Bond University, Australia 

Tim Hendtlass, School of Biophysical Sciences and Electrical Engineering. Australia 
Adele Howe, Colorado State University, USA 
L. C. Jain, Knowledge-Based Intelligent Engineering Systems, Australia 




VIII Organization 



Somnuk Keretho, Kasetsart University, Thailand 
Ramesh Kolluru, ACIM, University of Louisiana, Lafayette, USA 
Miroslav Kubat, University of Louisiana, Lafayette, USA 
Amruth Kumar, Ramapo College of New Jersey, USA 
Stanislav Kurkovsky, Columbus State University, USA 
G. Ligozat, LIMSI, France 
A. Liou, Taiwan 

Rasiah Loganantharaj, University of Louisiana, Lafayette, USA 
Anthony S. Maida, University of Louisiana, Lafayette, USA 
Bill Manaris, University of Louisiana, Lafayette, USA 
Debasia Mitra, Jackson State University, USA 
L. Monostori, Hungarian Academy of Sciences, Hungary 
Robert Morris, Florida Institute of Technology, USA 
Setsuo Ohsugh, Ohkubo Shinjyuku-ku, Japan 
Gunther Palm, University of Ulm, Germany 
Don Potter, University of Georgia, USA 
Henry Prade, IRIT, France 

Vijay Raghavan, University of Louisiana, Lafayette, USA 
Srini Ramaswamy, Tennessee Tech. University, USA 
A. S attar, Griffith University, Australia 
Guna Seetharaman, University of Louisiana, Lafayette, USA 
Jude Shavlik , University Wisconsin, USA 
S. N. Vassilyev, ICC, Russia 



Auxiliary Reviewers 



Ryan Gene Benton, University of Louisiana, Lafayette, USA 
Alexei E. Hmelnov, ICC, Russia 
Gerhard Kraetzschmar, University of Ulm, Germany 
Shiv Nagarajan, Griffith University, Australia 
Vineet Padmanabhan, Griffith University, Australia 
Audrey Postoenko, ICC, Russia 
Bushrod Thomas, University of Louisiana, Lafayette, USA 




Table of Contents 



Keynote Presentation 

Multisensor Data Fusion 1 

Pramod K. Varshney (Syracuse University, NY) 

Intelligent Agents I 

1 Implementing Multi-party Agent Conversations 4 

Christos Stergiou, Jeremy Pitt, Frank Guerin, and Alexander Artikis 
(Imperial College of Science Technology & Medicine) 

2 Agreement and Coalition Formation in Multiagent-Based Virtual 14 

Marketplaces 

Luis Brito and Jose Neves (Departamento de Informdtica, Universidade do 
Minho) 

3 A Framework for the Development of Cooperative Configuration Agents 24 
A. Felfernig, G. Friedrich, D. Jannach and M. Zanker (Institut fur 
Wirtschaftsinformatik und Anwendungssysteme) 

4 Java-Based Distributed Intelligent Agent Architecture for Building Safety- 34 
Critical Tele-Inspection Systems on the Internet 

Jae-Chul Moon, Soon-Ju Kang (School of Electronics and Electrical 
Engineering, Kyungpook National University) and Nam-Seog Park 
(Information Technology Lab, GE Corporate R &D) 

Artificial Neural Network I 

1 The Use of A1 Methods for Evaluating Condition Dependent Dynamic 46 

Models of Vehicle Brake Squeal 

Simon Feraday, Chris Harris (University of Southampton, UK), Kihong Shin 
(Hanyang University, South Korea), Mike Brennan (University of 
Southampton, UK) and Malcolm Lindsay (TRW Braking Systsems, UK) 

2 Towards an Estimation Aid for Nuclear Power Plant Refueling Operations 56 
J. A. Steele, L. A. Martin, A. Moyes, S. D. J. McArthur, J. R. McDonald 
(Centre for Electrical Power Engineering, University of Strathclyde), D. 

Young (British Energy Generation Ltd., East Kilbridge), R. Elrick (British 
Energy Generation Ltd., Barnwood), D. Howie (British Energy Generation 
Ltd., East Kilbridge) and I. Y. Yule (British Energy Ltd, Tomes s Power 
Station) 

3 Drilling Performance Prediction Using General Regression Neural Networks 61 
V. Karri (School of Engineering, University of Tasmania) 

4 Identifying Significant Parameters for Hall-Heroult Process Using General 73 
Regression Neural Network 

F. Frost (Comalco Aluminium Limited) and V. Karri (School of Engineering, 
University of Tasmania) 




X Table of Contents 



Data Mining I 

1 Mapping Object-Oriented Systems to Distributed Systems Using Data 79 

Mining Techniques 

Miguel A. Serrano, Doris L. Carver (Dept, of Computer Science, LSU, 

Louisiana) and Carlos Montes de Oca (Centro de Investigacion en 
Matemdticas, Mexico) 

2 Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic 85 
Data 

Bruce Wooley, Susan Bridges, Julia Hodges, and Anthony Skjellum (Dept, of 
Computer Science, Mississippi State University) 

3 Information Management and Process Improvement Using Data Mining 93 
Techniques 

W. M. Gibbons (University of Ulster), M. Ranta (Helsinki University of 
Technology), T. M. Scott (University of Ulster), and M. Mantyla (Helsinki 
University of Technology) 

Combinatorial Optimization 

1 A Comparative Analysis of Search Methods as Applied to Shearographic 99 
Fringe Modeling 

Paul Clay, Alan Crispin (Leeds Metropolitian University, UK) and Sam 
Crossley (AOS Technology Ltd, UK) 

2 Vision Guided Bin Picking and Mounting in a Flexible Assembly Cell 109 
Martin Berger, Gernot Bachler and Stefan Scherer (Computer Graphics 

and Vision, Graz University of Technology) 

3 A Brokering Algorithm for Cost & QoS-Based Winner Determination in 119 
Combinatorial Auctions 

Aneurin M. Easwaran and Jeremy Pitt (Imperial College of Science, 

Technology & Medicine London, UK) 

4 An Overview of a Synergetic Combination of Local Search with 129 

Evolutionary Learning to Solve Optimization Problems 

Rasiah Loganantharaj and Bushrod Thomas (Center for Advancved 
Computer Studies, University of Louisiana) 

Expert Systems I 

1 Maintenance of KBS 's by Domain Experts: The Holy Grail in Practice 139 
Arne Bultman, Joris Kuipers (ASZ Research and Development, The 
Netherlands) and Frank van Harmelen (Faculty of Science, Vrije 
Universiteit Amsterdam) 

2 A Simulation-Based Procedure for Expert System Evaluation 149 

Chunsheng Yang (National Research Council, Canada) Kuniji Kose 
(Hiroshima University, Japan), Sieu Phan (National Research Council, 

Canada) and Pikuei Kuo (National Taiwan Ocean University, ROC) 

3 Gas Circulator Design Advisory System: A Web Based Decision Support 160 
System for the Nuclear Industry 

J. Menal, A. Moyes, S. McArthur, J.A. Steele and J. McDonald (University 
of Strathclyde, UK) 




Table of Contents 



XI 



4 Expert Systems and Mathematical Optimization Approaches on Physical 168 
Layout Optimization Problems 

Julio C. G. Pimentel (Dept, of Elect. & Comp. Eng., Laval University), 

Yosef Gavriel (Dept. ofECE, Virginia Tech) and Eber A. Schmitz (NCE, 

Federal University of Rio de Janeiro) 

Diagnosis I 

1 Locating Bugs in Java Programs - Lirst Results of the Java Diagnosis 174 

Experiments Project 

Cristinel Mateis, Markus Stumptner and Franz Wotawa (T echnische 
Universitat Wien, Institut fur Informations systeme) 

2 Application of a Real-Time Expert System for Fault Diagnosis 1 84 

Chriss Angeli (T echnological Eduction Institute of Piraeues) 

3 Operative Diagnosis Algorithms for Single-Fault in Graph-Based Systems 192 
Mourad Elhadef Bechir El Ayeb (Mathematics and Computer Sciene, 

University of Sherbrooke, Canada) and Nageswara S. V. Rao (Oak Ridge 
National Laboratory, Oak Ridge) 

4 On a Model-Based Diagnosis for Synchronous Boolean Network 198 

Satoshi Hiratsuka and Akira Fusaoka (Department of Computer Science, 
Ritsumeikan University, Nojihigashi, Kusatsu-city, Japan) 

5 DermatExpert: Dermatological Diagnosis Through the Internet 204 

Hans W. Guesgen and Jeong Scon Koo (Computer Science Department, 
University of Auckland) 

Best Papers 

1 Aerial Spray Deposition Management Using the Genetic Algorithm 210 

W. D. Potter, W. Bi (Artificial Intelligence Center, University of Georgia), 

D. Twardus, H. Thistle, M. J. Twery, J. Ghent (United States Department 
of Agriculture, Forest Service) and M. Teske (Continuum Dynamics) 

2 Dynamic Data Mining 220 

Vijay Raghavan and Alaaeldin Hafez (Center for Advanced Computer 
Studies, University of Louisiana) 

Information Systems I 

1 Knowledge-Intensive Gathering and Integration of Statistical Information 230 
on European Fisheries 

Mike Klinkert, Jan Treur (Vrije Universiteit Amsterdam) and Tim Verwaart 
(Agricultural Economics Research Institue LEI) 

2 Using a Semantic Model and XML for Document Annotation 236 

Bogdan D. Czejdo and Cezary Sobaniec (Dept, of Mathematics and 
Computer Science, Loyola University, New Orleans) 

3 Understanding Support of Group in Web Collaborative Learning, Based on 242 
Divergence Among Different Answering Processes 

Tomoko Kojiri and Toyohide Watanabe (Nagoya University, Japan) 




XII Table of Contents 



Fuzzy Logic and Its Applications 

1 Fuzzy Modeling Approach for Integrated Assessments Using Cultural 250 

Theory 

Adrian Yazici, Frederick E. Petry (Dept, of Computer engineering, Tulane 
University) and Curt Pendergraft (The American Outback, Colorado 
Springs) 

2 Fuzzy Knowledge-Based System for Performing Conflation in Geographical 260 
Information Systems 

Harold Foley (Xavier University of Louisiana) and Frederick E. Petry 
(Tulane University) 

3 Modeling of, and Reasoning with Recurrent Events with Imprecise Durations 272 
Stanislav Kurkovsky (Dept, of Computer Science, Columbus State 

University) and Rasiah Loganantharaj (Center for Advanced Computer 
Studies, University of Louisiana at Lafayette) 

4 Linguistic Approximation and Semantic Adjustment in the Modeling Process 284 
Eric Fimbel (Centre de Recherche en Neuropsychologie, Institut 
Universitaire de Geriatric de Montreal) 

5 A Fuzzy Inference Algorithm for Lithology Analysis in Formation 290 

Evaluation 

Hujun Li (New Mexico Petroleum Recovery Research Center), Fansheng Li, 
Andrew H. Sung (Department of Computer Science, New Mexico Institute of 
Mining and Technology) and William W. Weiss (New Mexico Petroleum 
Recovery Research Center) 

Intelligent Agents II 

1 Approximating the 0-1 Multiple Knapsack Problem with Agent 296 

Decomposition and Market Negotiation 

Brent A. Smolinski (Lawrence Livermore National Laboratory, California) 

2 Design and Development of Autonomous Intelligence Smart Sensors 306 

Ramesh Kolluru, Rasiah Loganantharaj, S. Smith, P. Bayyapu, G. LaBauve 
(University of Louisiana at Lafayette), James Spenser, Jeffery Hooker, Steve 
Simmons and T. Herbert (Intelligent Machine Concepts, Louisiana) 

3 ADDGEO: An Intelligent Agent to Assist Geologist Finding Petroleum in 316 
Offshore Lands 

Ana C. Bicharra Garcia, Paula M. Maciel and Inhauma Neves Ferraz 
(Universidade Federal Fluminense, Brazil) 

4 SOMulANT : Organizing Information Using Multiple Agents 322 

Tim Hendtlass (Center for Intelligent Systems and Complex Processes, 

School of Biophysical Sciences and Electrical Engineering, Swinburne 
University of Technology) 




Table of Contents 



XIII 



Design 

1 Inventiveness as Belief Revision and a Heuristic Rule of Inventive Design 328 
Y. B. Karasik (Nortel Networks, Canada) 

2 A Decision Support Tool for the Conceptual Design of De-oiling Systems 334 
Badria Al-Shihi, Paul W.K Chung and Richard G. Holdich (Loughborough 
University, U.K.) 

3 ProCon: Decision Support for Resource Management in a Global Production 345 
Network 

Florian Golm (FFA Ford Research Center Aachen) and Alexander V. 

Smirnov (St. Petersburg Institute for Informatics and Automation of the 
Russian Academy of Sciences) 

4 Intelligent Infrastructure that Support System's Changes 351 

Jovan Cakic (Computing Laboratory, University of Kent) 

Diagnosis II 

1 Using Description Logics for Case-Based Reasoning in Hybrid Diagnosis 357 
Yacine Zeghib, Francois De Beuvron and Martina Kullmann (LIIA, France) 

2 Printer Troubleshooting Using Bayesian Networks 367 

Claus Skaanning (Hewlett-Packard Company), Finn V. Jensen and Uffe 
Kjaerulff (Department of Computer Science, Aalborg University) 

3 Using XML and Other Techniques to Enhance Supportability of Diagnostic 380 

Expert Systems 

G. Forsyth (DSTO, Airframes and Engines Division) and John Delaney 
(e Vision Pty Ltd.) 

4 Learning and Diagnosis in Manufacturing Processes Through an Executable 390 
Bayesian Network 

M. A. Rodrigues (School of Computing & Management, Sheffield Hallam 
University), Y. Lui, L. Bottaci, and D. I. Rigas (Department of Computer 
Science, University of Hull) 

Expert Systems II 

1 Solving Large Configuration Problems Efficiently by Clustering the 396 

ConBaCon Model 

Ulrich John (Research Institute for Computer Architecture and Software 
Technology) 

2 XProM: A Collaborative Knowledge-Based Project Management Tool 406 

Rattikorn Hewett (Dept, of Computer Science and Engineering, Florida 
Atlantic University) and John Coffey (Institute for Human & Machine 
Cognition, University of West Florida) 

3 Building Logistics Networks Using Model-Based Reasoning Techniques 414 
Robbie Nakatsu and Izak Benbasat (University of British Columbia, 

Canada) 

4 A Supporting System for Colored Knitting Design 420 

Daisuke Suzuki (Dept of ICS, Nagoya Institute of Technology), Tsuyoshi 
Miyazaki (Sugiyama Jogakuen University), Koji Yamada, Tsuyoshi 
Nakamura and Hidenori Itoh (Dept of ICS, Nagoya Institute of Technology) 




XIV Table of Contents 



Machine Learning and Its Applications 

1 Learning Middle Game Patterns in Chess: A Case Study 426 

Miroslav Kubat (Center for Advanced Computer Studies, University of 
Louisiana at Lafayette) and Jan Zizka ( Masaryk University, Czech 

Republic) 

2 Meta-classifiers and Selective Superiority 434 

Ryan Benton, Miroslav Kubat and Rasaiah LoganantharaJ (Center for 
Advanced Computer Studies, University of Louisiana at Lafayette) 

Logic and Its Applications 

1 The Formal Specification and Implementation of a Modest First Order 443 

Temporal Logic 

Sharad Sachdev (Nortel Networks, Canada) and Andre Trudel (Acadia 
University, Canada) 

2 Determining Effective Military Decisive Points through Knowledge-Rich 453 
Case-Based Reasoning 

David E. Moriarty (University of Southern California, Information Sciences 
Institute) 

3 A Constraint-Based Approach to Simulate Faults in Telecommunication 463 
Networks 

Aomar Osmani and Francois Levy (Laboratoire d’informatique de Paris- 
Nord) 

4 A Least Common Subsumer Operation for an Expressive Description Logic 474 
Thomas Mantay, (Universitat Hamburg, Germany) 

Pattern Recognition 

1 Blob Analysis Using Watershed Transformation 
Yi Cui (Beijing University of Posts and Telecommunications, China) and 
Nan Zhou (Mechanical Engineering, Texas) 

2 A Novel Fusion of Holistic and Analytical Paradigms for the Recognition of 
Handwritten Address Fields 

Chin KeongLee and Graham Leedham (School of Applied Science, 

Singapore) 

3 PAWIAN - A Parallel Image Recognition System 
Oliver Hempel, Ulrich Buker and George Hartmann (University of 
Paderborn, Germany) 

4 An Automatic Configuration System for Handwriting Recognition Problems 
Cara OBoyle, Barry Smyth and Franz Geiselbrechtinger (Department of 
Computer Science, University College Dublin) 

5 Detection of Circular Object with a High Speed Algorithm 
Adel A. Sewisy (Assiut University, Egypt) 

Artificial Neural Networks II 

1 Neural Network Based Compensation of Micromachined Accelerometers for 534 
Static and Low Frequency Applications 

Elena Gaura, Nigel Steele and Richard J. Rider (Coventry University, UK) 



482 

492 

502 

512 

522 




Table of Contents 



XV 



2 Improving Peanut Maturity Prediction Using a Hybrid Artificial Neural 543 

Network and Fuzzy Inference System 

H. L. Silvio, R. W. McClendon and E. W. Tollner (University of Georgia, 

Athens, GA) 

3 CIM-The Hybrid Symbolic/Connectionist Rule-Based Inference System 549 
Pattarachai Lalitrojwong (Information Technology, Thailand) 

4 A Neural Network Document Classifier with Linguistic Feature Selection 555 
Hahn-Ming Lee, Chih-Ming Chen and Cheng-Wei Hwang (Department of 
Electronic Engineering, National Taiwan University of Science and 
Technology) 

5 Color Pattern Recognition on the Random Neural Network Model 561 

Jose Aguilar and Valentina Rossell (CEMISID. Dpto. de Computacion, 

Facultad de Ingenieria, Universidad de los Andes.) 

6 Integrating Neural Network and Symbolic Inference for Predictions in Food 567 
Extrusion Process 

Ming Zhou (Department of Industrial & Mechanical Technology, Indiana 
State University) and James Paik (W. K. Kellogg Institute, USA) 

Natural Language Processing 

1 Automatic Priority Assignment to E-mail Messages Based on Information 573 
Extraction and User's Action History 

Takaaki Hasegawa and Hisashi Ohara (NTT Cyber Space Laboratories, 

Japan) 

2 Information Extraction for Validation of Software Documentation 583 

Patti Lutsky (Arbortext, Inc.) 

3 Object Orientation in Natural Language Processing 591 

Mostafa M. Aref (Information & Computer Science Department, King Fahd 
University of Petroleum & Minerals) 

Genetic Algorithm 

1 A Study of Order Based Genetic and Evolutionary Algorithms in 601 

Combinatorial Optimization Problems 

Miguel Rocha and Carla Vilela and Jose Neves (Departmento de 
Informatica, Universidade do Minho) 

2 Nuclear Power Plant Preventive Maintenance Planting Using Genetic 611 

Algorithms 

Vili Podgorelec, Peter Kokol (University of Maribor, Slovenia) and Andrej 
Kunej (Nuclear Power Plant Krsko, Slovenia) 

3 Progress Report: Improving the Stock Price Forecasting Performance of the 617 

Bull Flag Heuristic With Genetic Algorithms and Neural Networks 

William Leigh, Edwin Odisho, Noemi Paz (University of Central Florida, 

Dept, of MIS) and Mario Paz (University of Louisville, Dept, of Civil 
Engineering) 

4 Advanced Reservoir Simulation Using Soft Computing 623 

G. Janoski, F.-S. Li, M. Pietrzyk, A. H. Sung (Dept, of Computer Science, 

New Mexico Institute of Mining and Technology), S.-H. Chang and R. B. 

Grigg (Petroleum Recovery Research Center, New Mexico Institute of 
Mining and Technology) 




XVI Table of Contents 



Information Systems II 

1 Forest Ecosystem Management via the NED Intelligent Information System 629 
W. D. Potter, X. Deng, S. Somasekar, S. Liu (Artificial Intelligence Center, 
University of Georgia), H. M. Rauscher and S. Thomasma (USD A Forest 
Service, Bent Creek Experimental Forest) 

2 Friendly Information Retrieval through Adaptive Restructuring of 639 

Information Space 

Tomoko Murakami, Ryohei Orihara and Takehiko Yokota (Information- 
Base Functions Toshiba Laboratory, Japan) 

3 A Smart Pointer Technique for Distributed Spatial Databases 645 

Orlando Karam (Wofford College), Frederick Petry (T ulane University) 

and Kevin Shaw (NRL-SSC) 

Distributed Problem Solving 

1 Deploying the Mobile- Agent Technology in Warehouse Management 65 1 

Mei-LingL. Liu, Tao Yang, Serna Alptekin (California Polytechnic State 
University, California) and Kiyoshi Kato (Nihon Fukushi University, Japan) 

2 A Lightweight Capability Communication Mechanism 660 

David S. Robertson (University of Edinburgh, Scotland), Jaume Agusti 
(Bellaterra, Catalunya), Flario S. Correa da Silva (Universidade deSao 

Paulo, Brazil), Wamberto Vasconcelos (Universidade Estadual do Ceara, 

Brazil), and Ana Cristina V. dcMelo (Universidade deSao Paulo, Brazil) 

3 Model-Based Control for Industrial Processes Using a Virtual Laboratory 67 1 
Rung T. Bui (Universite du Quebec a Chicoutimit), J. Perron (Alcan 
International Limited) and C. Pillion (Universite du Quebec a Chicoutimit) 

4 Autonomous Agents for Distributed Problem Solving in Condition 683 

Monitoring 

E. E. Mangina, S. D. J. McArthur and J. R. McDonald (Department of 
Electronic & Electrical Engineering, Centre for Electrical Power 
Engineering, University of Strathclyde) 

5 Modeling Issues for Rubber-Sheeting Process in an Object Oriented, 693 

Distributed and Parallel Environment 

Frederick E. Petry and Maria J. Somodevilla (Department of EECS, Tulane 
University) 

Intelligent Agents III 

1 Reasoning and Belief Revision in an Agent for Emergent Process 699 

Management 

John Debenham (University of Technology, Australia) 

2 System Design and Control Framework for an Autonomous Mobile Robot 705 
Application on Predefined Ferromagnetic Surfaces 

Mahmut Fettahlioglu and Aydin Ersak (EEE Dept., METU, Ankara, Turkey) 

3 Intelligent and Self-Adaptive Interface 711 

Hadhoum Boukachour, Claude Duvallet and Alain Cardon (LIH, Institut 
Universitaire de Technologic, France) 




Table of Contents XVII 



4 Agent Architecture: Using Java Exceptions in a Nonstandard Way and an 111 
Object Oriented Approach to Evolution of Intelligence 
Cengiz Gunay (Center for Advancved Computer Studies, University of 
Louisiana) 

Artificial Neural Networks III 

1 Neural Network Based Machinability Evaluation 723 

Chris Nikolopoulos (Dept, of Computer Science, Bradley University), Iqbal 
Shareef (Dept, of Manufacturing and Industrial Engineering, Bradley 
University) and Donald Kalmes (Caterpillar Inc.) 

2 Performance of MGMDH Network on Structural Piecewise System 731 

Identification 

Ali K. Setoodehnia and Hong Li (McNeese State University, Lake Charles, 
Louisiana) 

3 Black-Box Identification of the Electromagnetic Torque of Induction 741 

Motors: Polynomial and Neural Models 

Lucia Frosini and Giovanni Petrecca (Department of Electrical 
Engineering, University of Pavia) 

Author Index 749 




Multisensor Data Fusion* 



Pramod K. Varshney 

Department of Electrical Engineering and Computer Science 
Syracuse University 

121 Link Hall, Syracuse, NY 13244, USA 
varshney@syr . edu 



Extended Abstract 

Multisensor data fusion is a key enabling technology in which information from 
a number of sources is integrated to form a unified picture [1]. This concept 
has been applied to numerous fields and new applications are being explored 
constantly. Even though most multisensor data fusion applications have been 
developed relatively recently, the notion of data fusion has always been around. 
In fact, all of us employ multisensor data fusion principles in our daily lives. 
The human brain is an excellent example of an operational fusion system that 
performs extremely well. It integrates sensory information, namely sight, sound, 
smell, taste and touch data and makes inferences regarding the problem at hand. 
It has been a natural desire of researchers in different disciplines of science and 
engineering to emulate this information fusion ability of the human brain. The 
idea is that fusion of complementary information available from different sensors 
will yield more accurate results for information processing problems. Significant 
advances in this important field have been made but perfect emulation of the 
human brain remains an elusive goal. 

More formally, multisensor data fusion refers to the acquisition, process- 
ing and synergistic combination of information gathered by various knowledge 
sources and sensors to provide a better understanding of the phenomenon un- 
der consideration. In addition, data from different sensors can be employed to 
introduce or enhance intelligence and system control functions. On the surface, 
the notion of fusion may appear to be straightforward but the design and imple- 
mentation of fusion systems is an extremely complex task. Modeling, processing, 
fusion and interpretation of diverse sensor data for knowledge assimilation and 
inferencing are challenging problems. These problems become even more difficult 
when the available data is incomplete, inconsistent or imprecise. In spite of the 
difficulties, research and development effort is being carried out vigorously due to 
the potential for significantly superior system performance. Fusion systems that 
can operate in real-time are becoming increasingly practical due to the recent 
advances in the areas of sensors, signal processing algorithms, VLSI technology 
and high performance computing and communications. 

* Research sponsored by the US Space V Naval Warfare Systems Center under Grant 
N66001-99-1-8922. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 1-3, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



2 



Pramod K. Varshney 



The ultimate objective of a multisensor data fusion system is to provide 
an accurate situation assessment so that appropriate actions can be taken. The 
advantage of a multisensor system over a single sensor system can be quantified in 
terms of improvement in the situation assessment ability. There are many factors 
that contribute to the enhancement of quantifiable system performance and to 
the improvement of the overall system utility. These include improved system 
reliability, robustness to failures, extended coverage, shorter response time, and 
improved system performance. When designing a multisensor data fusion system 
for a specific application, there are a number of fundamental points that need to 
be addressed. Answers to these questions dictate the overall system design and 
implementation. Some of the key factors are the nature of the sensors including 
their computational ability, location of the sensors, available communication 
infrastructure, and system performance objectives. 

Multiple sensors are widely used in a variety of fields [2-5]. In most applica- 
tions of fusion, basic system objectives are: detection of the presence of an object 
or a condition, object recognition, object identification/classification, tracking, 
monitoring, and change detection. The applications can broadly be classified 
into two groups namely military and nonmilitary applications. Most military 
applications deal with detection, location, tracking and identification of military 
entities such as ships, aircrafts, weapons and missiles. These entities could be 
static or dynamic. Examples of some specific military applications are ocean 
surveillance, and air-to-air and surface-to-air defense systems. Nonmilitary ap- 
plications of fusion cover a wide spectrum. Some of these are air traffic control, 
law enforcement, robotics, manufacturing, medical diagnosis and remote sens- 
ing. Techniques developed for military surveillance can readily be applied to air 
traffic control of commercial traffic as well as in the law enforcement field to 
identify and track aircrafts carrying drugs and other contraband items. Another 
law enforcement application under current investigation is the use of different 
sensing technologies such as millimeter wave radar, and infrared sensors to de- 
tect weapons being carried by people into places of high public interest. Robotics 
has greatly benefited from sensor fusion concepts. The main function of robots 
is to transport and manipulate objects while avoiding obstacles automatically 
in industrial and manufacturing environments. They are extremely useful in 
hazardous workplaces, e.g., nuclear power plants and hazardous waste handling 
facilities. Other industrial applications include in-process workpiece handling, 
loading and unloading of industrial trucks, welding, inspection, fault diagnosis, 
and assembly lines. Remote sensing applications employ synthetic aperture radar 
(SAR), infrared, and electromagnetic sensors for the surveillance of the earth. 
They can monitor crops, weather patterns, mineral resources, and environmen- 
tal conditions. Underground sensors can detect seismic activity while ground 
penetrating radars can detect buried hazardous waste. Another application is 
the monitoring of the planets and the solar system by means of satellites, space 
probes and the Hubble telescope. 

Advances in hardware, software and algorithms have made it possible to 
employ multiple data sources for information gathering and then to use them in 



Multisensor Data Fusion 



3 



a synergistic fashion to develop a composite picture of the phenomenon under 
investigation. In this talk, several fundamental concepts of the important and 
rapidly evolving field of multisensor data fusion will be introduced. A conceptual 
framework to understand the data fusion process will be provided. Several issues 
related to the architecture of fusion systems will be discussed. A number of 
examples to illustrate the applications of multisensor fusion will be presented. 
The talk will end with concluding remarks that outline some open problems in 
the field. 



Bibliography 

1. P.K. Varshney, ’’Scanning the Special Issue on Data Fusion,” Proc. of the 
IEEE, Vol. 85, pp. 3-5, Jan. 1997. 

2. D.L. Hall and J. Llinas, ” An Introduction to Multisensor Data Fusion,” Proc. 
of the IEEE, Vol. 85, pp. 6-23, Jan. 1997. 

3. J. Llinas and E. Waltz, Multisensor Data Eusion, Boston: Artech House, 
1990. 

4. D. Hall, Mathematical Techniques in Multisensor Data Eusion, Boston: Artech 
House, 1992. 

5. M.A. Abidi and R.C. Gonzalez, Eds., Data Eusion in Robotics and Machine 
Intelligence, San Diego: Academic Press, 1992. 

6. R. Antony, ’’Database Support to Data Fusion Automation,” Proc. of the 
IEEE, Vol. 85, pp. 39-53, Jan. 1997. 

7. B. Dasarathy, ’’Sensor Fusion Potential Exploitation - Innovative Architec- 
tures and Illustrative Applications” Proc. of the IEEE, Vol. 85, pp. 24-38, 
Jan. 1997. 

8. P.K. Varshney, Distributed Detection and Data Eusion, New York: Springer- 
Verlag, 1997. 



Implementing Multi-party Agent Conversations 



Christos Stergiou^ Jeremy Pitt^ Frank Guerin ^ Alexander Artikis^ 
^ Department of Electrical and Electronic Engineering 
Imperial College of Science Technology and Medicine 
Exhibition Road, London SW7 2BT, UK 
{c.stergiou, j.pitt, g.guerin, a.artikis} @ ic.ac.uk 



Abstract. Multi-Party agent conversations occur frequently in multi-agent sys- 
tems, for example in brokering. However, clear, formal and unambiguous speci- 
fications of the semantics of such conversations are essential for open imple- 
mentation by different developers. We address this issue by generalizing a pro- 
tocol-based semantic framework for expressing the semantics of Agent Com- 
munication Languages. This is used to give formal specifications of brokerage 
protocols, which are implemented on top of an agent-oriented middleware sys- 
tem. We conclude that an emphasis on design and formal specification lan- 
guages in agent communication leads to clearer interfaces, improved develop- 
ment prospects, better re-use and with a potentially significant impact on stan- 
dardisation efforts. 



1 Introduction 

Conversation policies and communication protocols have proved useful and indeed 
almost essential for high-level interoperability between heterogeneous agents in 
multi-agent systems (MAS). However, they have primarily been based on one-to-one 
conversations, i.e. dialogues between only two agents. It is a feature of many MAS, 
though, that there is some ‘well known’ agent (cf. ‘well known’ ports in TCP/IP 
networks) that provides generic facilities to all other agents, for example directory 
and management services. 

In particular, the KQML language specification includes a Facilitator [4], which 
also supports third-party conversations. That is, it is crucially involved in some com- 
monly recurring patterns of interaction between three agents, for recruitment, broker- 
age, subscription and recommendation. In some cases, there is even indirection, as 
one agent may not be aware of who one of the others is in the conversation. 

For developing agents in open systems, it is important that the interactions in such 
conversations have a precise and unambiguous meaning. Accordingly, this paper ad- 
dresses the issue by generalizing the protocol-based, semantic framework of Pitt and 
Mamdani [5], developed for describing the semantics of Agent Communication Lan- 
guages (ACLs) at different levels of abstraction. Using this technique we have imple- 
mented on top of a developed “Agent Oriented Middleware”, called the ACTP [I], 
the various interaction patterns described in [5]; by extending the ACTP with some 
extra libraries described towards the end of this paper. 



R. Loganantharaj et al. (Eds.): lEA/AlE 2000, LNAl 1821, pp. 4-13, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Implementing Multi-party Agent Conversations 



5 



Section 2 of this paper reviews the KQML performatives and protocols and the na- 
ture of the semantic problem, and reviews the general semantic framework, which 
will be used to frame a solution. Section 3 demonstrates a solution, which is based on 
the ideas of conversation identifiers and formal specification, which serve as refer- 
ence implementation models for the protocols. Section 4 shows the implementation of 
the semantics described and we conclude in Section 5 with some remarks on the ease 
of understanding and application of the specifications, and the potential impact of this 
approach for MAS design and standardisation efforts. 

2 Background and Motivation 

2.1 Multi-Party Conversations & Protocols 

There is a definite requirement for multi-party conversations in MAS applications 
where brokerage and/or auctions are required. For example, in KQML a special class 
of agent called facilitators was introduced to perform various useful communication 
services, in particular mediation or brokerage services. As with the contract-net pro- 
tocol, the value of these services was the frequency with which they occurred in dif- 
ferent applications. In [3], four interaction patterns based on these services were de- 
scribed, for recruitment, brokerage, recommendation and subscription (Fig.l). 

The problem with understanding and applying these diagrams is that in [3], the 
semantics of KQML was an open issue. This meant some difficulty in interpretation. 
For example, in the recommend protocol, agent A was supposed to ask F is there was 
an agent willing to provide a certain service (e.g. asIfX)), and F would reply “once it 
learns that B is willing to accept askfX) performatives”. But this means that agent A 
may have to block until B advertises, whereas in fact it would be more useful for A to 
know directly that no agent has advertised the service it requires. 




tell(X) 

Fig. 1. KQML Communication facilitator services 



This problem has largely been addressed in KQML by the use of Coloured Petri 
Nets [2], but it remains a problem for the FIPA ACL semantics. In a series of papers 
submitted to FIPA, an attempt has been made to express the semantics of the broker 
communicative act in the SL logic of the FIPA semantics. The experience has been 
that it is very hard to do, understand, verify and apply. 

Nevertheless, these protocols have been implemented and widely used in a variety 
of applications. Probably, a number of specific solutions have been developed and it 
is unlikely that such systems could then interoperate. What we are trying to achieve 
with this paper is to show how such protocols can be specified in a general semantic 
framework. This framework is briefly reviewed in the next section. 

2.2 The General Semantic Framework 

The protocol-based semantic framework has been introduced in [5]. The main idea 
was to separate out the action-level semantics (i.e. observed speech acts) from the 



6 



Christos Stergiou et al. 



intentional semantics (agent-internal reasons for and acting and replying) and the 
content-level semantics (i.e. the meaning of the content actually communicated). 

In this framework, an ACL was specified as a 3 tuple <Perf, Prot, reply>, where 
Perf is a set of perfomative names, Prot is a set of protocol names, and reply is a 
partial function given by: 

reply: Perf x Prot x P{Perf) (1) 

The reply gives for each perfomative, ‘uttered’ in the context of a conversation be- 
ing conducted according to a specific protocol, what performatives are acceptable 
replies. The reply function therefore specifies a finite state diagram for each protocol 
named in Prot. An agent ^ communicates with (and communicates information to) an 
agent r via a speech act. This (possibly infinite) set is denoted by speech_acts, a sin- 
gle member of which is represented by: 

perj{ r, (C, L, O, cp, i, ts) )> (2) 

This is saying that ^ does (communicates with) performative perf with content C in 
language L using ontology O in the context of protocol cp as part of conversation 
(identified by) i at time of sending ts. 

The meaning of such a speech act is then given by: 

I |<s, perfi r, (C, L, O, cp, i, ts) )>| | = Ir<r, sa> (3) 

where the performative of sa e replyiperf, cp, convr(i)) 

This means that, in this framework, at the observable action level, the meaning of a 
speech act is the intention to give a reply. 

To refine the intended semantics, three further functions are required, which are 
specified relative to each agent a, and state what that agent does with a message, not 
how it does it. The three functions were (1) a procedure for computing the change in 
an agent’s information state from the content of an incoming message; (2) a proce- 
dure for selecting a performative from a set of performatives (valid replies), and (3) a 
function which mapped a conversation identifier onto the current state of the proto- 
col. From these functions, we could specify intentional (logical) description of the 
reasons for and reactions to a speech act. These then served as reference implementa- 
tion models that agent developers could use to implement the appropriate internal and 
external behaviors for their agents. To fully characterise the intended semantics, the 
content level meaning is required, and this is dependent upon the application. There- 
fore, the protocol-based semantic framework supports a general methodology for 
designing an ACL for a particular application [5]: 

Generalization at the action level: additional performatives and protocols can be 
introduced to create an new ACL and new patterns of interaction; 

Specialization at the intentional level: a reference implementation model, possibly 
referring to agents’ beliefs, desires and intentions, could be specified to give intended 
behavioral meanings for performatives in the context of the protocols; 

Instantiation at the content level: the specific decision-making functionality for 
deciding which reply from a set of allowed replies can also be specified. For example. 




Implementing Multi-party Agent Conversations 



7 



the same protocol may be used in quite different application domains and the decision 
making may be dependent on a number of different factors. 

We will now show how the brokerage protocols of the KQML paper [3] can be 
specified at the action level, and give logical specifications to complete the meaning 
at the intentional level. 



3 A Semantics for Brokerage 



3.1 Conversation Identifiers 



Conversation identifiers in ACLs are used to identify a particular dialogue between 
two agents, so those messages with the same performative will not be confused. In 
the FIPA specifications, the conversation identifier is a parameter of the ACL mes- 
sage, but it is not part of the semantics. 

Therefore our solution brings the conversation identifiers into the semantics, but 
these need to be unique. One way of guaranteeing uniqueness is to use the FIPA 
naming system for Globally Unique Identifiers (GUID) for agent names and a time- 

stamp gene rated from the system clock. 

Figure 2: Parameterised conversation identifiers 



OA increments eonversation eount i+1 
©A sends performative 1 to B: id (i+l,_) 

©B inerements eonversation eount to j+1 
0 B replies with performative2 & id (j+1, i+1) 




An alternative mechanism for generating unique conversation identifiers is to use a 
2-tuple, and for each participant to supply one component, say an integer. Each agent 
maintains a count of the number of conversations it has engaged in, and increments 
this by one for each new conversation. Then, it assigns the current count for its first 
speech act in the conversation (Fig.2). If both agents on either side of the conversa- 
tion do this, then the combination of protocol state and the conversation identifier 2- 
tuple is enough to uniquely identify the conversation and which message has been 
sent in reply to which other, i.e. there is no need to have separate: reply-with 
and : in- reply- to message parameters in the ACL. Where no reply is expected 
or required, the receiving agent need take no action about replying, but still labels the 
conversation for later reference. 

The advantage of having one component of the 2-tuple conversation identifier pro- 
vided by each party is that each one can then label its own component with additional 
information, and return the other party’s information intact. The conversation identi- 
fier can be treated as a term (in the Prolog sense) and can contain structured informa- 



8 



Christos Stergiou et al. 



tion. The functor of the term is the uniquely-assigned integer, and the arguments can 
be the extra information to be interpreted by the receiving agent in context. 

This results that the conversation identifiers can be treated like “Our Reference” 
and “Your Reference” labels in human information transactions (e.g. memo passing). 
By convention, one uses whatever system for one’s own reference, and respects the 
system used by the other party, (by convention in Fig. 2, we put, sender’s reference 
first and the receiver’s reference second.) The advantage for agent communication is 
the facility to interpret the reference in context: for example, in Fig. 2 the interpreta- 
tion of the reference j{A,i) is for agent B send the reply to the content of the “memo” 
(speech act) received from F to agent A quoting its reference i. Then, the conversation 
identifier remains constant but the additional information can be variable throughout 
the lifetime of a single conversation. We define the function cid for each agent: 

cid\ Integer x Agent Integer (4) 

This gives, for a conversation identifier and an agent, the identifier used by that 
agent to identify a conversation. Using a 2-tuple identifier in the agent conversations 
provides another major advantage; a degree of security and authentication in the 
communications. Provided that the communication medium will not be intercepted, 
the use of the identifiers authenticates the interacting agents. In particular, when agent 
A sends an integer to agent B in order to initiate a conversation and then receives that 
integer from agent B along with another one, then agent A knows that it is talking to 
agent B and no other agent. Similar handshaking mechanisms are used in many estab- 
lished authentication protocols as well as in the TCP. The latter protocol uses such a 
handshaking algorithm before establishing a connection between two hosts in order to 
authenticate and check the availability of a peer. 



Figure 3: Finite State Diagrams for KQML Performatives 




recru it 1 A^F-.recmitQ 


brokerl 


A^F ibrokerQ F^AitellQ 


(lO?) 

recru it2 B^Fadve^seQ 


B—^F:advertiseQ F—^B:ash() 

~ 


(tT^ 


j broker2 




recruits B^AitellO F^B:askO 

aTTi) 


B^FUellQ 



3.2 Reference Implementation Models 

The KQML brokerage protocols can each be analysed as a composite of two or 
three separate one-to-one conversations. For example, the one to one conversation 
comprising the recruit and broker protocols are shown in fig. 3. We then illustrate the 
responses (tropisms) to a selected speech act. The formal language for the specifica- 
tions a modal-action logic, where formulas like [a,A]p=>q are intuitively read: “after 
agent a does action A, then p is true. If condition p holds then q should be made true. 



Implementing Multi-party Agent Conversations 



9 



where q usually denotes an intentional formula. Note that after the first speech act in 
a protocol, we assume that the receiver assigns its component of the conversation 
identifier, and that after that DONE(<a, performative( ..., is true, where (ij) is 

the conversation identifier. 

The finite state diagrams for the various parts of the recruit and broker performa- 
tive/protocols are illustrated in Figure 3. The finite state diagrams for subscribe and 
recommend are simple variations on an ask/tell (question/answer) sequence. Logical 
specifications of the intentional meaning of the speech acts in the context of these 
protocols, together with an English paraphrasal, are given below (see also [5]). 

Table 1 . Reference Implementation Specifications 

1 [A, recruit( F, ask(X), recruitl, (/,_) )] 

2 3b j\k-DONE{<b,advertise{F,ask{X), recruitl, {kJ))>)^Ip<F,ask{bA^,recruit2, (j(A,i),k) )> 

^ after A does a recruit act in the recruitl protocol with conversation identifier (/,_), 

^ if there is an agent b who has done an advertise in the recruitl protocol with 

conversation identifier (kj) then form the intention to ask B about X with iden- 
tifier j{A,i), meaning reply to A quoting identifier i 

3 [F, ask{ B, X, recruitl, ij{A,i),k ) )] 

4 Ib<B, tell( A, X, recruitl, (/,/) )> 

^ after F does an ask in the recruitl protocol with conversation identifier 

form the intention to tell A about X in the recruitl protocol with conversation 
identifier (/,/) 

4 Implementation 

This section describes the implementation of the Conversation Identifiers Mechanism 
(CIM), the way performatives are resolved and how this mechanism fits in the Agent 
Communication Transfer Protocol (ACTP) [1] (fig. 4). The exchange of the Conver- 
sation Identifiers was demonstrated in the previous sections and furthermore the exact 
interactions between the involved agents were outlined theoretically. # 

Figure 4: Framework used to test Conversation Identifiers implementation 

User Interface User Interface User Interface 

Buyer Agent Facilitator Agent Seller Agent 

Conversation — ^ Memory Conversation 

Module Module 

Communication Communication 

Module (ACTP) Module (ACTP) 

~m “TT" 

Communication Layer - Agent Communication Transfer Protocol 



Java was chosen as the main implementation language of the (CIM) because this 
mechanism was intended to be incorporated into the ACTP library (outlined in sec- 
tion 4.1). Apart from Java, the Structured Query Language (SQL) was used for the 



Conversation 

Module 



Memory 




10 



Christos Stergiou et al. 



manipulation of the database of the agents (Buyer, Seller, Facilitator). In particular, 
the database transactions were performed by SQL commands that were embedded in 
Java using the Java Database Connectivity (JDBC). 

4.1 The Agent Communication Transfer Protocol 

The ACTP is part of an “Agent Oriented Middleware” (AOM) architecture under 
development. The ACTP forms the basic platform in which the CIM is build in and 
used to show that it leads to a more fault tolerant system. The application protocols 
that the ACTP [1] uses as transport mechanisms use other protocols in order to carry 
out their tasks. The architecture of the ACTP comprises two levels of encapsulation; 
in the higher one, the networking details of the communication processes are hidden 
from the agents. In the lower one, the implementation of the underlying application 
protocols is concealed from the ACTP. As shown in fig. 5, the ACTP resides in the 
Application Layer and, in particular, it is above all the other application protocols. In 
other words, the ACTP provides an interface between the ACL messages and the 
Application Layer. In this way, the ACTP receives the communicative acts through 
the ACL messages and uses the appropriate underlying communication mechanism 
for the actual data transfer. If the communication is not successful, then the ACTP 
will decide which protocol will be the next one to use. Currently by querying a static 
pre-made database table in which it returns the next best protocol to be used. (E.g. if 
the RMI fails for some reason during the process of creating a tuple of identifiers for 
a particular conversation, then the ACTP will use an alternative protocol in order to 
carry out this process, thus contributing to a more robust system to the underlying 
failures). 

Figure 5: The location of the CIM 



INTERACTION PROTOCOLS 
ACL Messages 




The ACTP supports logical abstraction of the communication process. The Con- 
versation Identifiers module, which is incorporated in the ACTP (fig. 5), also, con- 
tributes to a further abstraction. So, the agents can distinguish which messages map to 
what conversation and at the same time benefit from the ability to support a richer set of 



speech acts (i.e. recommend, subscribe, recruit, broker and so on). Technically speak- 
ing CIM uses the Java Remote Method Invocation (RMI) to handle the exchange of 



Implementing Multi-party Agent Conversations 



11 



the conversation identifiers between the communicating agents. The exchange of 
these identifiers is performed between the ACTP modules of the agents, thus making 
this exchanging process completely transparent to the agents. 

4.2 Introduction to the Framework 

In order to demonstrate the way that (CIM) works, we have implemented three types 
of agents; Buyers, Sellers and a Facilitator. The basic layout of the framework of the 
above types of agents is illustrated in fig.4. Each type of agent has its own strategy 
(i.e. a Buyer agent will buy a good at the lowest possible price) and they all commu- 
nicate with each other (usually at least once through the facilitator) forming a virtual 
marketplace. The upper two layers in fig.4 are part of the agent execution framework, 
while the lower layers are part of the communication platform, which is the ACTP. 
The Facilitator agent is a generic middle agent acting (depending on the performative 
chosen by the Buyer agent) as a broker, recruiter, subscriber, recommender and a 
matchmaker (simply matches a request from a buyer to its equivalent in the seller 
database). 

In particular, the ACTP module of each agent generates a conversation identifier (cid) 
reference locally. Then there are two different ways to identify a conversation under 
way. Either, the handshaking mode in which the buyer agent sends a cid to the peer 
agent, and waits for the peer’s reply to construct the final cid tuple identifing the 
conversation between the sending and receiving agents. Or, the cid is sent in the 
actual ACL message and if the peer agent wishes to reply then it uses the sender’s cid 
and a cid locally generated forming a new ACL reply message. From then on all 
subsequent messages in the same conversation use the tuple cid. Locally (at each 
agent) the identifiers of each agent conversation are stored in in the memory of their 
local ACTP module. Each identifier is erased from the memory when the 
conversation has ended or when the execution of an agent is terminated. 



[Figure 6: Conversation Module 



Performative - 
generator , 



Conversation - 
Id generator . 



Message — [update Conv Id Object I ^ Agent “Memory” 

(performative) ^ ^ (stores current 

Resoiver [Retneve Conv. id [ previous 

conversation ids) 



Conversation 
Identifiers exchanger 



it 



[store Conv. Id ObjecT" 



Retrieve Conv. Id 



RMI Conversation id Interface 



The Conversation module that is shown as a small 3D square in fig. 4 is 
decomposed in fig. 6 for the case of the handshaking mode. It includes five distinct 
components shown in 3D squares. The two main components that constitute the core 
of the Conversation Module are the Message Resolver (MR) and the Conversation 
Identifiers exchanger (Cle). The Cle interacts with a random generator and stores or 
retrieves a conversation id for a particular conversation (always initiated from the 
agent above as depicted in Lig. 4) from the Agent “Memory” (which is an associative 
array, but could well be a database table). The Message Resolver either takes a 




12 



Christos Stergiou et al. 



particular performative as input parameter from the agent or calls the Performative 
Generator to get one. The MR also consults with the ACTP Module “Memory” either 
to retrieve a stored Conversation Identifier (that is the content that it maps to) or to 
update the “Memory” because of a new reply that arrived. Note that in the case of 
including the cid in the ACL message the received message has to go through an 
ACL parser to determine in which conversation it belongs to. 

4.3 CIM, Robustness and Experimental Results 

From the implementation above the following results were obtained. The aggregation 
of average messages required for each performative is different for each one. Also 
depending of weather the conversation id is embedded in the agent ACL message or 
the conversation Ids are in exchanged before hand in the form of handshaking. We 
observed that when the cid is included inside the ACL message the number of mes- 
sages that need to get interchanged is less (in contrast to handshaking mode). At a 
first glance it seems that handling a conversation with the cid hidden in the ACL 
message will be faster because fewer messages need to be exchanged. However, if 
the cids are hidden in the ACL messages it is more time consuming to parse and in- 
terpret the entire incoming message. So, to extract the cids from it. However, the way 
we have implemented the cids, we managed to get the best from both worlds. 

In our model the cids are included as a wrapper of the ACL message in order to 
avoid a) having to parse the messages and b) the additional overhead caused by the 
messages needed for the handshaking. The insight of this experimental implementa- 
tion is that by imitating the inherent mechanistic messaging interactions of the net- 
work (like TCP/IP) we can provide the agents, at a higher level, with a more robust 
and faster way of dealing with higher level conversations. In particular in the TCP/IP 
model the handshaking takes place before the actual messages, where as in our model 
we attach the conversation identifiers to the actual message. 

A scenario to demonstrate the above statement would be for an Agent (buyer of 
seller) to assign different priorities next to the conversation identifiers resulting an 
immediate response to a subsequent message received or a delayed response accord- 
ing to the preferences. 

In addition, fault tolerance is increased in the whole framework. For instance, lets 
assume that an agent is down for a few seconds and it looses the conversation that it 
was involved in. In this case if that agent had stored the cids in a permanent type of 
memory (i.e. hard disk); then it will be able to resume its conversations in a smoother 
way than in the case of: FIPA or KQML (in FIPA and KQML the conversations need 
to start all over again). In addition, the Facilitator of our model does not create a 
greater bottleneck than in the case of FIPA or KQML; the number of messages ex- 
changed in any of these schemes is exactly the same. 

5 Summary and Conclusions 

In this paper, we have taken a general semantic framework for specifying the seman- 
tics of an Agent Communication Language, which concentrated on one-to-one con- 
versations, and extended it to cover multi-party conversation as found in brokerage 
protocols. The specific extensions introduced were: 




Implementing Multi-party Agent Conversations 



13 



• Increased range of the possible intended replies, so that the “meaning” of a speech 
act between a speaker and hearer in one protocol could, at the ‘action’ level, be the 
intention to perform another speech act in a new protocol with a third party; 

• Conversation identifiers that were structures (2-tuples), one element of which was 
supplied by each party to the speech act, and could be parameterized with addi- 
tional information. The identifiers were then “brought into” the semantics at the 
‘intentional’ level rather than being or considered as pragmatics as in the FIFA 
ACL semantics; 

• A richer representation of protocol states, so that states were no longer single inte- 
gers but could include a number of variables. These variables could change value 
after either the speaker sent the message, and reflect the fact that both the speaker 
and hearer are changing state when a speech act is performed. 

To conclude we then used these extensions to implement the specifications in 
agents that use the ACL. We need these extensions to specify multi-party conversa- 
tions for brokerage protocols, and implement these specifications. We argue that the 
emphasis on design and formal specifications leads to clearer interfaces, improved 
development prospects, better re-use and with a potential impact on standardisation 
efforts. If the semantic framework specification layer and message transfer mecha- 
nisms are taken as a standard, it would increase the likelihood, we would argue of 
dynamic open multi-agent systems, which comprises heterogeneous agents interoper- 
ating. 

Acknowledgements 

This work has been undertaken in the context of the UK EPSRC and Nortel Networks 
joint-funded project CASBAh (Common Agent Service Brokering Architecture, 
GR/L34440), and support from these organizations is gratefully acknowledged. 

References 

[1] Artikis A., Pitt J., Stergiou C., The Agent Communication Transfer Protocol., to 
appear in the Proceedings of Autonomous Agents 2000 Conference. 

[2] Cost, R., Chen, Y., Finin, T., Labrou, Y. and Pemg, Y. Using Colored Petri Nets 
for Conversation Modeliing. In F. Dignum and B, Chaib-draa (eds.) IJCAF99 
Workshop on Agent Communication Languages, Stockholm, 1999. 

[3] Finin T., Labrou Y. and Mayfield J. KQML as an Agent Communication Lan- 
guage. In J. Bradshaw (ed.): Software Agents, AAAI Press, pp291-316, 1997. 

[4] FIPA. FIPA’97 Specification Part 2: Agent Communication Language. 1997. 

[5] Pitt J., Guerin F., Stergiou C., Protocols and Intentional Specifications for Multi 
Party Agent Conversations for Brokerage and Auctions., to appear in the Pro- 
ceedings of Autonomous Agents 2000 Conference. 

[6] Pitt, J. and Mamdani, A. Designing Agent Communication Languages for Multi- 
Agent Systems. In F. Garijo and M. Boman (eds.): Multi-Agent System Engineer- 
ing, LNAI1647, Springer, ppl02-114, 1999. 




Agreement and Coalition Formation 

in 

Multiagent-Based Virtual Marketplaces 



Luis Brito and Jose Neves 



Departamento de Informatica 
Universidade do Minho, Braga, PORTUGAL 

Ibr itoOgil . di . uminho . pt 
j ne ves@di . uminho . pt 



Abstract. Industrial and commercial enterprises deal with individuals 
which, under pressure from the market, may cartelize. These cartels or 
alliances of parties or interests to further common aims, may be seen 
as active or dormant; ie., an active alliance may occur when, under an 
individual’s initiative, an agreement is negotiated, a dormant one arises 
by the drop-out of some agent expecting to obtain future gains by acting 
that way. 

In Electronic Commerce (EC), specially in a Virtual Markets {VMs) en- 
vironment, one is faced with two main contenders: the client and the 
provider^ entities that may be located at different physical sites and con- 
duct their dealings in environments that mimic traditional marketplaces, 
much like the traditional kasbah. 

On the other hand, in a multiagent setting, one has to consider the proper 
protocols for multiagent interaction. Indeed, a natural approach relies on 
upholding or favoring democracy or the interests of the common players. 
This approach is interesting but it usually relies on the fact that all 
agents are truthful in the advertisement of their information; real-world 
dealings thrive on the lack of total or even misleading information. 
Agreement protocols used in these systems often rely on models that as- 
sume characteristics difficult to reach in a real-world environment. Nev- 
ertheless the theoretical implications that come from the study of such 
systems are paramount, and will be the object of this work. 

Keywords: Multiagent Systems, Intelligent Agents, Agreement, Coali- 
tion, Virtual Marketplaces 



1 Introduction 

The traditional approach to Electronic Commerce (EC) mimics the behavior of 
real world commercial agents, based on two kinds of opposite and self-centered 
agents: the client and the provider. The providers agents may leave their self- 
centering attitudes behind and tend to be highly cooperative, generating al- 
liances to be seen as active or dormant. With a dormant attitude, an agent, in 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 14-23, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Agreement and Coalition Formation in Virtual Marketplaces 



15 



spite of not receiving any direct compensation, does not pose itself as a competi- 
tor (perennial compensations are expected such as the establishment of inter- 
provider friendship). An active attitude bases itself on the fact that, if an agent 
wishes to cooperate, it must pass information to third parties. 

Agents with an active attitude are extreamly interesting, leading, in tradi- 
tional approaches, to over-simplified agreement protocols. As one knows, in the 
real-world, agents are scarcely truthful. This fact renders the typical majority- 
vote protocol useless. Drawing from the Distributed Systems (DS) arena, which 
has studied the problems of faulty distributed behavior, one can devise protocols 
that minimize pernicious influences from lying agents. 

The architecture of the client ^s and the provider’s agents is basically the 
same; indeed, in the real-world, a provider and a client may present some com- 
mon basic personality traces, although with diverse particular instantiations; 
z.e., in this context, personality is to be understood as the sum of all the behav- 
ioral and mental characteristics by means of which an individual is recognized 
as being unique. 

Price evolution is extreamly important in offer /counter-offer dealings. An im- 
proper proposal can lead to financial disaster. Some authors base their prices on 
continuous functions. Although valid, this approach seams to be better oriented 
to long-term dealings rather than to short-term (and very-short-term) ones. A 
price series may position itself as the best approach to short-term dealings (with 
the added benefit of easier calculation) [4]. 

2 The Agreement Protocols 

DS has studied the problem of agreement protocols with the goal of using it in 
operations such as distributed database transactions and clock synchronization. 

Agreement can be reached by having each processor send its vote and taking 
the minimum^ maximum^ mean^ or some other kind of metric on the votes. This 
is a formal or convenient attitude once, in reality, processors can fail and send, 
consciously or not, conflicting values to the intervening parties. Processors need 
to exchange values among themselves trying to minimize the effects of faulty 
behavior. A processor refines the knowledge it has about a processor’s parameter 
as it learns about the parameters other processors advertise. This process is made 
possible by an agreement protocol 

The term processor can be seen has the DS counterpart of an agent in MAS, 
therefore, the assumptions made for a DS system can easily be transported to 
the MAS one. 

2.1 The System Model 

A typical model used for the study of agreement protocols problems assumes 
that: 

— in a n processors environment there is, at most, m faulty processors (n>m); 

— the system is fully connected (z.e., the processors can directly communicate 

with each other by message passing mechanisms); 



16 



Luis Brito and Jose Neves 



— the receiver processor is familiar with the identity of the sender; 

— the communication medium is reliable (Le., all messages are delivered with- 
out failure, being the processor the only error prone element). 

The value to be agreed upon is typically a binary element taken from the inter- 
val [0,1]. This boolean- like property enables polls to be conducted by asking each 
agent for a yes / no answer under some prevailing proposal. Multivalue agreement 
protocols in DS have also been object of study, but their inherent complexity is, 
as expected, far greater. 

The way a protocol flows depends, in many ways, of the way messages are 
exchanged. The use of authenticated messages is dependent on the capability 
to determine, with absolute certainty, their ancestry; z.e., in closed network en- 
vironments (typically with specific point to point connections) one is able to 
determine their parentage; on the other hand, in an open environment, such 
as an Internet, the use of cryptographic tools is necessary. The other kind of 
messages that can be exchanged are non- authenticated messages^ also called oral 
messages. The agreement is easier to reach in an authenticated message environ- 
ment, having in mind the overhead induced by the use of cryptographic tools. 

2.2 Agreement Classification 

A situation in which agreement is necessary can usually be reduced to one of 
three well-known problems: Byzantine agreement problem, consensus problem 
and interactive consistency problem. 

A classification for agreement problems can be reached taking into account 
two characteristics: the number of elements that initiate the protocol and the 
kind of element that is to be agreed upon. The Byzantine agreement problem is 
initiated by one of the agents (in MAS terms) and the final agreement is intended 
to be a single value (typically a yes j no value). The consensus problem arises 
when all agents initiate a process of agreement in order to reach a single value 
conclusion (once again, typically yes j no). The interactive consistency problem is 
initiated by all agents in order to reach a vector of values as the final conclusion. 

It can be proven that a solution for the Byzantine agreement problem can be 
taken as a primitive solution; z.e., through the use of such a solution one can eas- 
ily reach solutions for the consensus problem and for the interactive consistency 
problem. 

2.3 The Solution to the Byzantine Agreement Problem: The 
Lamport-Shostak-Pease Algorithm 

Using non- authenticated messages it has been shown that, in a fully connected 
network with n processors, there is a maximum number of faulty processors, and 
an agreement can be reached; z.e., it can be proven that the number of faulty 
processors, m, can not exceed (n — l)/3, being n the number of total processors 
in the network. Extrapolating to a MAS environment made of n agents, one 
may say that only m agents may be lying. 



Agreement and Coalition Formation in Virtual Marketplaces 



17 



Following the studies conducted by Pease, an algorithm based in oral mes- 
sages was developed for the Byzantine agreement problem by Lamport, which is 
known as Lamport- Shostak- Pease Algorithm (LSPA) or Oral Message Algorithm 
{OMA) (Figure 1) [5]. 



PROCEDURE LSPA (faulty .agents) : INTEGER 
VAR 

PROCESSOR-IDENTIFICATION processor _id; 

INTEGER value; 

BEGIN 

IF (faulty .agents == 0) 

THEN 

send_to.all (value) ; 

value = assert.value Jrom_source () ; 

ELSE 

send_to.all .except .source (value) ; 

FOR EACH processor G { i I processor (i) A i .processor _id 7 ^ processor.id 
A i .processor J.d 7 ^ source.id } 

REMOTE START processor . LSPA (faulty .agents - 1) 
results = vector.of jremotejresults () ; 
value = majority (results) ; 

END IF 

RETURN value; 

END. 



Fig. 1. Approximation to the LSPA algorithm used as a solution to the Byzantine 
agreement problem 



As one can see, the LSPA is quite complex. The agreement is achieved 
through a recursive procedure that can become extreamly hard to compute (z.e., 
the recursion formula is deep). 



3 The System Architecture 

The system architecture is based on the assumption that any company estab- 
lishes, through a series of agents, a Virtual Market (VM) that makes the en- 
vironment in which those agents, and some counterpart agents, are to engage 
commercial dealings. This image can be extended to the point where the VM is 
composed of multiple client and provider agents representing interests of multi- 
ple entities. 

Traditional multiagent approaches to EC rely mainly on auction mecha- 
nisms and on a basic subscription of principles among non-allied agents. As 
one knows, in real-world, prices are dictated by the coalition efforts of multi- 
ple service providers, where the client’s agents must maximize their pay-off by 
establishing an order under which concurrent proposals are to be treated. On 
the other hand, and according to the general rules stated before, it is possible 
to distinguish between a clients agent and a providers one; z.e., running over 
common basic personality traces it is possible to differentiate between the two 
(Figure 2). 



18 



Luis Brito and Jose Neves 





Beliefs 

Desires 

Intentions 

(Introspective 

Knowiedge) 




Generai 

Knowiedge 




Negotiation 

Procedures 







Personality 

Diferentiation 




Fig. 2. A layout for the client’s (a) and provider’s (b) agents 



3.1 A Community of Agents 

As in human society, an agent must develop an attitude-based system of beliefs, 
desires and intentions and maintain a certain state of tolerance towards coun- 
terpart elements. A process of community registration is supposed to develop 
and endure, which in this work will be approximated by a blackboard approach 
(Figure 3), being materialized by the extension of the relation attitude (), in 
the form: 

attitude: Agent , Type _of Agent , (Belief s , Desires , Intentions) ) 

^ {true, false, unknown}. 

where Agent, Type_of Agent and (Beliefs, Desires, Intentions (BDI)) stand, 
respectively, for the agent to which the BDI system applies, the agent’s type and 
the BDI system itself {e.g. a provider’s agent can not be accepted into a com- 
munity where there is a BDI system that disables coalition formation). 



Dealings 




Being Independent Being in a Coalition 

Clients Message Exchange 

Providers 



Fig. 3. A blackboard based system used for inter-agent communication 



The Client’s Agents The client’s agents act as auctioneers in the multia- 
gent marketplace; i.e., they conduct the auctions by announcing the lots and 
controlling the biddings. 

Each agent has defined, within itself, an auction time. Due to the asyn- 
chronous nature of the system it is necessary to ensure that there is enough time 





Agreement and Coalition Formation in Virtual Marketplaces 



19 



for at least some of the providers’ agents to call back. As in traditional market- 
places, the solution falls into a back-off period at the end of which all answers 
are ordered according to a predefined order relation. This relation defines the 
utility value for a certain offer; be., having in mind the self-interest of the agent, 
the offers are thus answered according to a certain chronological sequence. 

The agents were designed so that the handling of an answer precedes that 
of an offer. The procedure that regulates the client’s agents behavior is shown 
in Figure 4. 



PROCEDURE client ^gent (Agent .Attitude ap, Stimuli .Queue sq) 

VAR 

LIST provider feedback, provider answers , 
provider.off ers , ordered.of f ers ; 

BEGIN 

accept.community.values (ap) ; 
register.into.community (ap) ; 

REPEAT 

sleep (ap.time); 

provider feedback = [provider ^answer , provider .argument] ; 

WHILE NOT exists (sq, provider feedback, ap.code) 

END; 

provider.answers = recover.answers (sq, ap.code); 
provider.off ers = recover.of f ers (sq, ap.code); 
respond.to.answer (provider ^answers) ; 

ordered.of f ers = order.offers (provider.off ers , ap .negotiation Jcind) ; 
respond.to.of f er (ordered.off ers) ; 

UNTIL conclusion(Provider, Price) ; 
retirejCrom.community (ap) ; 

END. 



Fig. 4. The behavioral control algorithm for the clients’ agents 



General Knowledge and the Definition of a Metric Through an Order 
Relation The clients have, during the course of their dealings, the ability to 
accumulate historical information about providers and their products. This his- 
torical information can be kept in a by-deal way or, to make reasoning easier, in 
a cumulative way. This information is part of the general knowledge that accom- 
panies each one of the client’s agents. The Knowledge-Base {KB) of each agent 
is given in the terms of, among others, the predicates’ extensions: 

(i) product : Code , Name , Obs ^ {true , false , unknown} . 

(ii) provides : Code, Who, Price, Quality , Payment ^ {true , false , unknown} . 

(iii) provider : Who, Name , Latency , Fidelity , Delay ^{true , false , unknown} . 

where in (i) Code, Name and Obs stand, respectively, for the product’s code, its 
name and a comment or remark; in (ii) Code, Who, Price, Quality, and Payment 
stand, respectively, for the product’s code, the entity that provides it, the price 
at which it is provided, the quality assessment, and the pay up method on 
demand {e.g. 30-days, 60-days, 90-days); in (iii) Who, Name, Latency, Fidelity 



20 



Luis Brito and Jose Neves 



and Delay stand, respectively, for the provider’s code, its name, the order’s 
latency period (Le., the time that mediates between the order’s placement and 
the order’s acknowledgement times), the obligations the client has towards the 
provider, and the usual delay on orders placed with this provider. 

Historical information is an important factor for the establishment of a met- 
ric; indeed it will allow to set the orders’ chronological sequence, but falls short 
of being an universal standard; Le., the metric used by each client’s agent must 
be easily customizable to suit particular needs. It may be given in the form: 

metric = Price —preference {Pay merit) — Quality + Latency + Delay — Fidelity 

where an higher value indicates that a particular dealing is not interesting while, 
on the other hand, lower values (under the point of view of a particular agent), 
point to more interesting scenarios. 



Price Evolution During Negotiation In traditional marketplaces the prices 
clients and providers put forward seldom coincide; indeed, it is usual to see 
streams of offers/counter-offers to be exchanged among clients and providers. 
The goods or services prices that are the object of the counter-offers can be 
modeled in a variety of ways; z.e., one can state price evolution in terms of 
continuous functions or in terms of time-series. 

As it should be expected, counter-offer prices have, in a client’s agents sce- 
nario, a tendency to grow; Le., in order to strike a deal the prices will increase 
during the negotiation process. 

The way the prices evolve is a characteristic of each agent. It may take the 
form: 



Priceh^ent = + 



pricrprlvider - 

2 



where pricey denotes the client’s agent price at instant z, and 

denotes the provider’s agent one at the same instant. In terms of the providers’ 

agents it may take the form: 



. i-i Pricei^Lder - 

pr^CeJ^ov^der = Pr^CCj^rovzder ^ 

z.e., the prices of the goods or services should drop with time. 



The Provider’s Agents The providers’ agents are the most complex elements 
in the system. They are not only able to strike a deal with a client’s agent, 
but also to cooperate. Under this scenario one can distinguish two alternatives: 
cooperation through drop-out {dormant cooperation) and cooperation through 
agreement {active cooperation); ie., as in the real world, a provider’s agent can 



Agreement and Coalition Formation in Virtual Marketplaces 



21 



abandon a negotiation if asked to do so by some other agent (friendship or the 
search for future dividends) or, it can continue to negotiate but in agreement 
with other agents. 

Agreement is reached through an approach to the LSPA. Thus, reaching an 
agreement in this kind of multiagent systems is very similar to reach an agreement 
in distributed systems. Instead of faulty processors one has agents that, due to 
a non-interest in a proposed agreement, may lie. 

Due to the asynchronous nature of the system, a meditation time has been 
defined for each agent. This back- off period enables the agent to accumulate 
messages in the stimuli queue. The providers’ agents behave according to the 
procedure shown in Figure 5. 



PROCEDURE provider ^gent (Agent .Attitude ap, Stimuli .Queue sq) 

VAR 

LIST client feedback; 

MESSAGE message; 

BEGIN 

accept.community.values (ap) ; 
register.into.community (ap) ; 

REPEAT 

sleep (ap .meditation.time) ; 

/♦ determine if it tries to make some agent drop-out or 
if it wants to start an agreement ♦/ 
cooperation.impulse (ap) ; 

/♦ feedback treatment ♦/ 

client feedback = [client answer , cl ient argument , intention, drop.out, vote]; 
WHILE NOT exists (sq, client feedback, ap.code) 

END; 

message = first. element (sq) ; 

/♦ deal with the specificity of the message ♦/ 
behavior.pattern (ap, message); 

UNTIL conclusion(Client , Price) ; 
drop Jrom.community (ap) ; 

END. 



Fig. 5. The behavioral control algorithm for a providers’ agents 



Passive or Dormant Cooperation Passive or dormant cooperation is per- 
haps one of the simplest and ancient forms of cooperation. In fact, although a 
formal alliance among the involved parties does not exist, there is a tacit agree- 
ment among parties. Non-intervention is as important as active agreement. This 
fact is easily conveyed in war and human rights violation scenarios, where non- 
intervention (neutrality) by a country or community acts, in reality, as support 
for one of the conflicting factions. 

One can state that passive cooperation deals with two forms of compensation 
for the drop-out negotiation: the subjective and futuristic one. The former one 
deals with values like friendship or kindness. The last one deals with the foresight 
of future monetary compensation or the reinforcement of partnerships. 



22 



Luis Brito and Jose Neves 



Active Cooperation The achievement of an agreement among the providers’ 
agents may lead to disturbance in the traditional free-market approaches to 
VMs. Providers’ agents have the ability to cartelize, which will deprive the 
clients’ agents of reaching good deals; Le., one may fall into a monopoly sce- 
nario where the products’ prices on offer converge (the typical take it or leave it 
situation). 

In active cooperation the parties must take a stand; he., they must state their 
position in what concerns the establishment of a common price. An agreement 
is now given as the extension of the predicate: 

agreement : Who, Price, Compensation, Parties ^ {true , false , unknown} 

where Who, Price, Compensation and Parties stand, respectively, for the initi- 
ating agent, the price to be agreed upon, the agent’s compensation, and the list 
of the agents involved in the deal. 

An agent states, by voting, its position towards an agreement. The main 
difference between this approach and the traditional ones based on a majority 
vote, resides in the assumption of truthfulness. One assumes that an agent that 
is uninterested in the deal, fearing the formation of a coalition that may damage 
future outcomes, may lie; he., an agent may inform some agents that it agrees 
with the transaction, stating exactly the opposite to the others. Through the use 
of algorithms such as the LSPA one is able to minimize the inffuence of faulty 
parties while safe-guarding the right to vote. As it has been said earlier, one 
must ensure that the number of non- faulty agents is greater or equal to 3m + 1, 
where m is the number of faulty ones. 

The initiation of a passive or active cooperation resides with the attitude of 
each agent towards the deal. One is faced with two main scenarios: the proba- 
bilistic and the history-based ones. The former one assumes that engagement in 
active and passive cooperation has a combined probability of 1 (one); Le., an 
agent might engage in active cooperation with a probability of 0.7 and in passive 
cooperation with a probability of 0.3. The history-based approach enables the 
agent to decide, based on the present state of the environment, and in forecast 
mechanisms, which kind of cooperation to follow [2,3]. 

4 Conclusions 

The use of MAS for EC is extreamly interesting. Real-world marketplaces can be 
directly reflected by a multiagent architecture; z.e., through the use of agents, a 
communication platform and a set of protocols, marketplace dealings can be in- 
tuitively modeled. The model comprehends either traditional open markets {e.g. 
fish markets, markets for fruits and vegetables) or more closed and enterprise 
directed markets {e.g. provisioning in the textile industry and aeronautics) [1]. 

For a complete simulation and representation of real-world markets one must 
ensure that certain entities, that are traditionally contending for the choice of the 
clients’ agents, can reach occasional agreements. These agreements may be tacit 
or must involve an opinion statement] z.e., an agent cooperates with another 



Agreement and Coalition Formation in Virtual Marketplaces 



23 



agent by leaving the negotial ground or, on the other hand, through a voting 
mechanism, a series of agents is consulted on the definition of a common price. 
Obviously, cooperation takes its toll. Agents are moved by a desire to improve 
their personal satisfaction and an agreement is only reached if some kind of 
compensation can be given in return. This compensation can be monetary or it 
can be non-quantitive (establishment of future relationships). 

The agents’ opinion is important but a certain degree of fault-tolerance is 
necessary in order for an agreement to be reached. MAS can draw from studies 
conducted in the DS field. The LSPA is an interesting approach, though it can 
lead to an extreamly large exchange of messages. 

also paramount. Agents must be armed not only with the tools for survival 
in a MAS environment (communication, identity, etc) but with the essential 
knowledge (defined in an internal KB) to evaluate decision paths. 

Future work should bae centered in the definition of new algorithms that 
ensure agreement in multi-community environments and the implementation of 
monitoring agents. These agents should try to counter- act problems that arise 
from inherent asynchrony. This asynchrony can leads to blackboard environ- 
ments littered with redundant or unnecessary data which will inevitably lead 
to performance degradation. The problem of exponential growth in the num- 
ber of messages exchanged through the LSPA can partially be solved by the 
implementation of Dolev^s algorithm [6]. 



References 

1. N. R. Jennings and M. J. Wooldridge (editors), Agent Technology: Foundations, 
Applications, and Markets, Springer- Verlag, Berlin, Germany, 1998. 22 

2. P. Cortez, M. Rocha, J. Machado and J. Neves. A Neural Network Based Forecasting 
System, in Proceedings of ICNN’95 - International Conference on Neural Networks, 
Perth, Western Australia, November 1995. 22 

3. Neves, J. A Logic Interpreter to Handle Time and Negation in Logic Data Bases, 
in Proceedings of the ACM’84 Anual Conference - The Fifth Generation Challenge, 
San Francisco, California, USA, 1984. 22 

4. Chavez, A. and P. Maes Kasbah: An Agent Marketplace for Buying and Selling 
Goods, MIT Media Lab, Cambridge, MA 02139, 1996. 15 

5. Lamport, L., R. Shostak and M. Pease, The Byzantine Generals Problem, ACM 
Transactions on Programming Languages and Systems, July, 1982. 17 

6. Dolev, D., The Byzantine Generals Strike Again, Journal of Algorithms, Jan., 1982 
23 



A Framework for the Development of 
Cooperative Configuration Agents 



A. Felfernig, G. Friedrich, D. Jannach, and M. Zanker 



Institut fiir Wirtschaftsinformatik und Anwendungssysteme 
Universitat Klagenfurt, A-9020, Austria 
{f elf ernigjfriedrich, jannach, zanker }@ifi .uni-klu. ac . at 



Abstract. The integration of configuration systems to support supply 
chain integration of configurable products is still an open research issue. 
Current configurator approaches are designed for solving local configu- 
ration problems but there is no support for the integration of multiple 
configuration systems. In order to facilitate distributed configuration of 
customizable products we employ cooperating configuration agents ca- 
pable of managing requests and posting configuration subtasks to remote 
configuration agents. For integrating different knowledge representation 
formalisms of configuration agents we apply broadly used configuration 
domain specific modeling concepts to design shareable ontologies which 
can be interpreted by other agents. These concepts are defined as UML 
(Unified Modeling Language) stereotypes which can be automatically 
translated into a configuration agent’s knowledge representation. 



1 Introduction 

Knowledge-based configuration systems have a long history as a successful AI 
application area and today form the foundation for a thriving industry (e.g. 
telecommunication, automotive industry, computer industry, or financial ser- 
vices). A new challenge for the employment of configuration systems is their 
integration into the value chain of one or more product families (supply chain 
integration). While supply chain management of standardized, mostly well de- 
fined products is quite well supported by upcoming electronic marketplaces, 
auctioning mediators, or automatic purchasing systems, current configuration 
technology does not offer concepts and tools supporting supply chain integra- 
tion of configurable products. 

Configuration systems have likewise progressed from their rule-based origins 
to the use of higher level representations such as various forms of constraint satis- 
faction, description logics, or functional reasoning. This variety of representation 
formalisms causes incompatibilities between configurators and the question has 
to be answered how to integrate these systems. The increased use of knowledge- 
based configurators in various application domains as well as the increasingly 
complex tasks tackled by such systems ultimately lead to both the knowledge 
bases and the resulting configurations becoming larger and more complex. In this 
context, effective knowledge acquisition is crucial since configurator development 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 24-33, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



A Framework for the Development of Cooperative Configuration Agents 



25 



time is strictly limited, i.e. the product and the corresponding configuration sys- 
tem have to be developed in parallel. 

For the development of configuration agents, we show how these challenges 
can be met by using a standard design language (UML - Unified Modeling Lan- 
guage [12]) as notation in order to simplify the construction of a logic-based 
description of the domain knowledge as well as the construction of a common 
interpretable ontology which serves as basis for inter-agent communication. UML 
is widely applied in industrial software development as a standard design lan- 
guage. We employ the extension mechanism of UML (stereotypes) in order to ex- 
press configuration domain specific modeling concepts. The configuration agent 
development process is represented in Fig. 1. First we select a shared product 
model represented in UML (provided by a supplier) and a communication pro- 
tocol, which together constitute a shared ontology for cooperative configuration. 
The shared product model represents those component types and constraints 
of a supplier's configurable product, which should be visible for other configu- 
ration agents, i.e. can be integrated in their local product model. The shared 
product model is imported and integrated into the local product model which is 
designed in UML (1). After syntactic checks of the correct usage of the modeling 
concepts (2) this model is non- ambiguously translated into logical sentences (3). 
The resulting configuration agent is validated by the domain expert using test 
runs on examples (4). In case of unexpected results, the product model can be 
revised (1), otherwise it can be employed in productive use. Additionally, if the 
generated configuration agent acts as a supplier as well, those components and 
constraints, which should be visible to customer agents must be made public, 
i.e. provided as a shareable product model (5). 




Fig. 1. Configuration Agent Development Process 



The paper is organized as follows. First we characterize a configuration task 
and give a formal definition of a distributed configuration task (Section 2). By 
giving an example from the domain of hardware / software configuration we dis- 
cuss commonly used concepts for modeling configuration domains and their defi- 
nition in UML. Further we show how these concepts can automatically be trans- 



26 



A. Felfernig et al. 



lated into an executable logic representation (Section 3). In Section 4 we show 
how to integrate two configuration agents using XML (Extensible Markup Lan- 
guage) as communication language. In Section 5 we discuss the structure of our 
prototype system. Sections 6 and 7 contain related work and conclusions. 



2 Distributed Configuration Task 

In practice, configurations are built from a predefined catalog of component 
types of a given application domain. These component types {types) are de- 
scribed through a set of properties {attributes)^ and connection points {ports) 
representing logical or physical connections to other components. The attributes 
have an assigned domain {dom). The domain description {DD) of a configuration 
task contains this information and additional constraints on legal configurations. 
The actual configuration problem has to be solved according to a set of specific 
user requirements {SRS). Based on this definition of a local configuration task [5] 
we define a Distributed Configuration Task through the following sets of logical 
sentences. 

— DD = where DDi is the DD of agent i {i G {l..n} and n is the 

number of configuration agents^). 

— SRS = \JSRS^. 

A configuration result is described through sets of logical sentences ( COMPS, 
ATTRS, CONNS). In these sets, the employed components, attribute values, and 
established connections of a concrete customized product are represented. 

— COMPS = \jCOMPSi, where COMP Si represents sets of literals of the form 
type(c,t). t is included in the set of types defined in DDi. The constant c 
represents the identifier of a component. 

— CONNS = [jCONNSi, where CONN Si represents sets of literals of the form 
conn(cl,pl,c2,p2). cl, c2 are component identifiers from COMPSi. pi (p2) 
is a port of the component cl (c2). 

— ATTRS = l^ATTRSi, where ATTRSi represents sets of literals of the form 
val(c,a,v), where c is a component identifier, a is an attribute of that com- 
ponent, and V is the actual value of the attribute (selected from the domain 
of the attribute). 

The concept of a Consistent Distributed Configuration is defined as follows. 

Definition: Consistent Distributed Configuration. If (DD, SRS) is a 
configuration problem and COMPS, CONNS, and ATTRS represent a configura- 
tion result, then the configuration is consistent exactly iff DD U SRS U COMPS 
U CONNS U ATTRS can be satisfied. 

We specify that COMPS includes all required components, CONNS describes 
all required connections, and ATTRS includes a complete value assignment to all 



1 



Note that DD^ can also include imported product models. 



A Framework for the Development of Cooperative Configuration Agents 



27 



variables in order to achieve a complete distributed configuration^. Let AXcomp 
be the additional sentences for completeness purpose. 

In order to assure completeness and correctness of the distributed configura- 
tion w.r.t. the overall configuration task the following sentence must hold: 

— DD U SRS U COMPS U CONNS U ATTRS U AXcomp consistent iff^i : 
DDi U SRSi U COMPS U CONNS U ATTRS U AXcomp is consistent. 

This sentence is fulfilled if we allow in DD only sentences using type, conn, and 
val literals since COMPS U CONNS U ATTRS U AXcomp is a complete theory 
w.r.t. these literals. A distributed configuration, which is consistent and complete 
w.r.t. the domain description and the customer requirements, is called a Valid 
Distributed Configuration. 

In order to calculate solutions for a given distributed configuration task we 
employ asynchronous backtracking [14], which offers the basis for bounded learn- 
ing strategies supporting the reduction of search efforts. This efficient revision of 
requirements and design decisions is of particular interest for integrating config- 
urators, since supplier agents eventually discover conflicting requirements (no- 
goods) which must be communicated back to the requesting configurator. 

3 Construction of Configuration Agents 

In order to coordinate the distributed configuration process and to permit knowl- 
edge interchange between configuration agents, parts of the agents‘ knowledge 
bases must be shared. For representing the shared configuration knowledge we 
employ configuration domain specific modeling concepts defined as UML stereo- 
types. Note that UML is no precondition for designing product structures but 
alleviates construction and maintenance of these models as has been shown in [4] . 
Configuration models, that are designed using these concepts can be translated 
into the logic representation of a specific configuration agent. 

The shared knowledge together with an agreed upon communication protocol 
are the constitutive parts of a common ontology which is the prerequisite for the 
communication between configuration agents. In [2] different interpretations of 
ontologies in AI are discussed. First, ontologies are content theories about the 
sorts of objects, properties of objects, and relations between objects that are 
possible in a specified domain of knowledge. They provide the potential terms 
for describing the knowledge about the domain. Second, ontologies are a body of 
knowledge describing some domain using a representation vocabulary. Exactly 
this second interpretation is represented in our approach to distributed config- 
uration. In the following we shortly discuss the configuration domain specific 
modeling concepts expressed as UML stereotypes and their translation into an 
executable logic representation^. 

^ This is accomplished by additional logical sentences which can be generated using 
the domain description (see [5] for more details). 

^ A detailed discussion on the translation rules can be found in [4]. 



28 



A. Felfernig et al. 



We do not employ any configuration agent specific knowledge representation, 
but propose a translation into the component port model formalism, which is 
well established for modeling and solving configuration problems [7]. In gen- 
eral, consistency-based tools based on this model can use the logic theory de- 
rived from the UML model. Fig. 2 shows the knowledge bases of two config- 
uration agents (hardware and software configuration agent). The shared con- 
figuration knowledge is identified by emphasized components and constraints 
between these components {software-package, development- environment, desktop- 
publishing, os, os- 1, os- 2, os- 3, text- editor). Exactly these components are incorpo- 
rated into the local configuration model of the hardware configuration agent. In 
order to understand the graphical notation we shortly define the basic modeling 
concepts specific for the configuration domain and show the translation into the 
component port representation. 

— Component Types We use a stereotype class for representing component 
types since some limitations on these classes have to hold, e.g. there are no 
methods, attributes are limited to simple data types and enumerations. The 
component types of the hardware configuration agent in Fig. 2 are translated 
into the following types, attributes, and ports definitions: 

types={pc, floppy-unit, hd-unit, scsi,...}. 
attributes (pc) = {maxprice} , . . . 
dom(pc, maxprice)={0.. 10000},... 

port s (pc) = {floppy-unit-port , hd-unit-port , motherboard-port ,...}. 

— Requires X requires Y means, if X is part of the product then Y must 
be part of the product too. The constraint motherboard-2 requires p-II is 
translated as follows: 

type (ID 1, motherboard-2)^ 

3(ID2) type(ID2, p-II) A conn(ID2, motherboard-port, IDl, cpu-port). 

— Incompatible X incompatible Y denotes the fact that two components 
cannot be used within the same configuration. The incompatible relation is 
defined as a binary relation with a multiplicity of 1..1 in the UML model. 
The constraint motherboard- 1 incompatible p-I is translated as follows: 

type(IDl, motherboard- 1) A type(ID2, p-I) 

A conn(ID2, motherboard-port, IDl, cpu-port) ^ false. 

— Resources They impose additional constraints on the possible product 
structure. Some components can contribute to a resource whereas others 
are consuming a specified quantity from the resource. In an actual config- 
uration the resources must be balanced, i.e. the consumed resources must 
not exceed the provided resources. The contribution and consumption of a 
resource is modeled through relations consumes and produces. In our exam- 
ple, the price of a pc component must be less than or equal to the maximum 
price imposed by the customer {maxprice in Fig. 2). 



A Framework for the Development of Cooperative Configuration Agents 



29 



— Ports and Connections Not only the quantity and kind of the employed 
components can be important, but also how different components are con- 
nected to each other. Components can be connected through connection 
points (ports). One port can only be connected to exactly one other port, 
e.g. at-hus- connector is connected to at-bus-slot (Fig. 2). 




Fig. 2. Configuration Knowledge Bases 



— Additional modeling concepts and constraints The discussed mod- 
eling concepts have shown to cover a wide range of application areas for 
configuration [11]. Despite this, some application areas might have a need 
for special modeling concepts not covered so far. UML stereotypes are an 
extension mechanism with which further modeling concepts can be built into 
UML by adapting the corresponding meta model. In order to define restric- 
tions on the usage of the stereotypes well-formedness rules can be defined in 
OCL (Object Constraint Language), which is an integrative part of UML. 







30 



A. Felfernig et al. 



Beside the definition of the shared configuration knowledge, the model of the 
configuration agents^ dynamics is the second major part of a common ontology. 
In order to model the allowed states of an agent’s configuration process we 
employ state charts. State transitions are triggered by messages from (to) remote 
configuration agents. A simple example of an agent communication protocol is 
given in Fig. 3. First, the local agent (B) receives a request for configuration 
from a remote configuration agent denoted as agent A (1). B either calculates a 
consistent configuration and returns the result to A (2) or posts a configuration 
subtask to agent C (3), or is not able to find a consistent solution (6). If B receives 
a consistent subconfiguration from C (4), it continues the local search process 
and returns a found solution to A. If C does not find a consistent solution (5), B 
tries to calculate an alternative solution considering the nogoods from C^. If A 
accepts the solution received from B, this acceptance is communicated to C (7), 
otherwise A does not accept the found solution (8). 



fromA.accept / 




fromA.requestconfig 

[ self.configstopped ] / 
toC.requestconfig 



^gjtoA.nogoods 
y5te)iiiC.nogoods ‘oCnoaccept 
fromC.completeconfig 




Fig. 3. Agent Communication Protocol 



4 Example: Distributed Configuration Using XML 

In order to show the interaction process between configuration agents based on 
the shared ontology discussed in Section 3 we give a simple example for config- 
uring computer systems. For supporting knowledge interchange between the in- 
volved configuration agents we employ XML (Extensible Markup Language [13]) 
as agent communication language which is a standard for information exchange 
in the Internet. In order to enable the exchange of configuration knowledge via 
XML we need to translate the common ontology modeled in UML (see Sec- 
tion 3) into XML DTDs which define the structure of exchangeable XML mes- 
sages between the configuration agents. DTDs are primarily used for checking 
the conformity of received XML documents containing configuration data with 
the defined ontological commitments. 

In the following example XML documents represent the relevant configu- 
ration information which is exchanged between the agents. While agent B is 

^ Nogoods represent conflicting value instantiations causing contradictions in C‘s 
knowledge base. 



A Framework for the Development of Cooperative Configuration Agents 



31 



capable of configuring computer hardware, agent C calculates software configu- 
rations. In the following scenario B is contacted by a customer for configuring 
a pc system. C is responsible for configuring the software- package since detailed 
knowledge about software configuration is not available for B. 

First, B is contacted by the user via the user interface. The customer or- 
ders one floppy-unit, one eide-unit with one hdd-eide-1, a motherboard- 2, a p-II 
cpu, and a software- package consisting of an os-1 and a desktop-publishing sys- 
tem. The floppy-unit, motherboard, eide-unit, and cpu are configured without 
communicating with other configuration agents. For configuring the software- 
package B contacts C. The components os-1 and desktop-publishing are sent as 
initial (partial) configuration to C as follows 
i request config/, 

j software-package/, jos-1//, j desktop-publishing//, 
i /software-package/, 
i/requestconfig/, 

C tries to expand the partial configuration and detects the incompatibility 
between os-1 and desktop-publishing since a desktop-publishing system either 
requires an os- 2 or an os- 3. The following message is sent from C to B. 
jnogoods/, 

i software-package/, jos-1//, jdesktop-publishing/^ 
i / software-package/, 
i /nogoods/, 

B includes the nogoods as constraints into its local knowledge base. Since B 
is unable to calculate a consistent solution, the nogoods are presented to the 
customer. The customer decides to order os-1 without a corresponding desktop- 
publishing system, i.e. the following message is sent from B to C. 
i request config/, 
i software- p ackage i j os- 1 / /, 
i / software-package/, 
i/requestconfig/, 

C accepts B‘s configuration and sends the following message to B. 
jcompleteconfig/, 

i software-package/, jos- 1/, jprice/, 100 j / price/, j /os- 1/, 
i /software-package/, 
i / completeconfig/, 

B accepts the configuration by sending an j accept//, message to C. 

5 Prototype Development Environment 

In order to realize the concepts discussed in this paper we have implemented a 
prototype development environment for cooperative configuration agents. The 

^ Messages are divided into three different layers conforming the Knowledge Sharing 
Effort (KSE) model of knowledge interchange (content, message, and communication 
layer). For reasons of space limitations we only present portions of the communicated 
XML documents. 



32 



A. Felfernig et al. 



CASE tool Rational Rose is employed for specifying the agents product model 
as well as the communication protocol. For alleviating the model interchange 
between different modeling environments we translate the agent models spec- 
ified in Rational Rose into a neutral XMI (XML Metadata Interchange [10]) 
representation which is further translated into the knowledge representation of 
the corresponding configuration agents. In order to calculate configurations we 
employ an industrial-strength configuration engine {ILOG Configurator C+-h 
library). Configuration agents are implemented as distributed COM (Compo- 
nent Object Model) objects. The translation from XML to C++ is done by 
integrating the IBM XMLC XML parser. 

6 Related Work 

In order to solve large scale configuration problems, [3] propose a network of de- 
sign agents that represent part catalogs and design constraints. The whole prob- 
lem is decomposed into sub-problems of manageable size which are solved by the 
agents. Designing large scale products requires the cooperation of a number of 
different experts. In the SHADE (Shared Dependency Engineering) project [9] 
a KIF [8] formalism was used for representing engineering ontologies. Giving an 
example of a spring construction, the integration of a project engineering agent 
responsible for the definition of the component hierarchy and basic properties 
of mechanical components, a spring design agent responsible for the design of 
the detailed technical structure and an optimization agent responsible for opti- 
mization tasks is shown. The main focus of [3] is on efficient distributed design 
problem solving, [9] concentrate on the integration of different design views, 
whereas our concern is to provide effective support of distributed configuration 
problem solving , where knowledge is distributed between different agents having 
a restricted view on the whole configuration process. 

For the exchange of knowledge between agents in a Web-based environment 
XML is an adequate representation language since it is an evolving standard for 
information interchange in the Internet. XML standards are defined for various 
application areas, e.g. XMI is a standard for exchanging models defined in a 
modeling language conforming the MOF standard (Meta Object Facility). 

Automated generation of logic-based knowledge bases through translation 
of domain specific modeling concepts expressed in terms of a standard design 
language like UML has not been discussed so far. Comparable research has been 
done in automated and knowledge-based Software Engineering [6]. [1] define 
a formal semantics for object model diagrams based on OMT for supporting 
assessments of requirement specifications. Our work is complementary since our 
goal is the generation of executable logic descriptions. 

7 Conclusions 

In this paper we have presented an UML-based environment for the construc- 
tion of cooperative configuration agents. The integration of businesses by Inter- 



A Framework for the Development of Cooperative Configuration Agents 



33 



net technologies boosts the demand for integrative coordination of knowledge- 
based systems, such as configuration systems which have to move from stan- 
dalone to cooperative systems. In this context the agent paradigm is a quite 
suitable approach since each configurator can be seen as an autonomous acting 
entity, receiving requests and demanding customizable parts. A precondition for 
integrating configuration agents is the definition of a common ontology using 
concepts interpreted in the same way by the configuration agents. In order to 
integrate configuration agents in an Internet environment XML is the appropri- 
ate language for exchanging knowledge between agents, since it is a wide spread 
standard and easy to integrate in Web-based applications. 



References 

1. R.H. Bourdeau and B. Cheng. A Formal Semantics of Object Models. IEEE 
Transactions on Software Engineering^ 21,10:799-821, 1995. 32 

2. B. Chandrasekaran, J. Josephson, and R. Benjamins. What Are Ontologies, and 
Why do we Need Them? IEEE Intelligent Systems, 14,1:20-26, 1999. 27 

3. T.P. Darr and W.P. Birmingham. An Attribute-Space Representation and Algo- 
rithm for Concurrent Engineering. AIEDAM, 10,1:21-35, 1996. 32 

4. A. Felfernig, G. Friedrich, and D. Jannach. UML as domain specific language 

for the construction of knowledge-based configuration systems. In Proceedings of 
SEKE‘99, pages 337-345, Kaiserslautern, Germany, 1999 27 

5. G. Friedrich and M. Stumptner. Consistency-Based Configuration. In AAAI Work- 
shop on Configuration, Technical Report WS-99-05, pages 35-40, Orlando, Florida, 
1999. 26, 27 

6. M. Lowry, A. Philpot, T. Pressburger, and I. Underwood. A Formal Approach to 
Domain-Oriented Software Design Environments. In Proceedings 9th Knowledge- 
Based Software Engineering Conference, pages 48-57, Monterey, CA, 1994. 32 

7. S. Mittal and E. Erayman. Towards a Generic Model of Configuration Tasks. In 
Proceedings of the 11th IJCAI, pages 1395-1401, Detroit, MI, 1989. 28 

8. R. Neches, R. Eikes, T. Einin, T. Gruber, R. Path, T. Senator, and W. Swartout. 
Enabling technology for knowledge sharing. AI Magazine, 12,3:36-56, 1991. 32 

9. G.R. Olsen, M. Cutkosky, J.M. Tenenbaum, and T.R. Gruber. Collaborative En- 
gineering based on Knowledge Sharing Agreements. In Proceedings of ACME 
Database Symposium, pages 11-14, Minneapolis, MN, USA, 1994. 32 

10. Object Management Group (OMG). XMI Specification, www.omg.org, 1999. 32 

11. H. Peltonen, T. Mannisto, T. Soininen, J. Tiihonen, A. Martio, and R. Sulonen. 
Concepts for Modeling Configurable Products. In Proceedings of European Con- 
ference Product Data Technology Days, pages 189-196, Sandhurst, UK, 1998. 29 

12. J. Rumbaugh, I. Jacobson, and G. Booch. The Unified Modeling Language Refer- 
ence Manual. Addison- Wesley, 1998. 25 

13. W3C. Extensible Markup Language (XML), www.w3.org, 1999. 30 

14. M. Yokoo, E.H. Durfee, T. Ishida, and K. Kuwabara. The distributed constraint 
satisfaction problem. IEEE Transactions on Knowledge and Data Engineering, 
10,5:673-685, 1998. 27 



A revised and extended version of this paper will appear in the International Journal 
of Software Engineering and Knowledge Engineering (IJSEKE). 



Java-Based Distributed Intelligent Agent Architecture 
for Building Safety-Critical Tele-Inspection Systems on 

the Internet 



Jae-Chul Moon Soon-Ju Kang and Nam-Seog Park^ 

’School of Electronics and Electrical Engineering Kyungpook National University, Korea 
vgate@palgong . knu . ac . kr 
s j kang@ee . knu . ac . kr 
^Information Technology Lab. GE Corporate R & D 
parkns@crd . ge . com 



Abstract. The inspection of SG(Steam-Generator) tubes in a NPP(Nuclear 
Power Plant) is a time-consuming, laborious, and safety-critical task because of 
several serious constraints including a highly radiated working environment, 
tight task schedule, and the need for many highly qualified human inspectors. In 
order to realize this kind of safety-critical and complex inspection using the 
Internet, intelligent agents based on Java distributed technologies were 
designed. The proposed agent architecture, which is a declarative commanding 
concept based on a layered architecture, was introduced to minimize the affect 
of Internet latency and maximize the reliability of an inspection system using 
tele- operation via the Internet. 



1 Introduction 

Implementing tele-operation on the Internet overcomes the space limitation in 
utilizing or sharing the variable resources. Already, many researchers have shown 
how an Internet-based tele-operation technique can be used in various areas of 
application[l-4]. Mark Cox used this technique to schedule and take a picture of the 
universe on the WWW( World- Wide Web) using a telescope at a remote site[l]. Ken 
Goldberg demonstrated how a scara-type robot could be manipulated on the 
WWW[2]. Eric Paulos made it possible to watch a live exhibition on the web by the 
remote control of a camera at an exhibition center[3]. Rich Wallace also showed how 
to manipulate a remote camera on the Intemet[4]. 

In spite of the success of the above studies in showing the practicality of web- 
based tele-operations, these studies also revealed some weakness that need to be 
resolved before this technique can be applied to safety critical areas such as nuclear 
power plants. First of all, in all previous studies, a human operator is required to 
immediately react through the Internet to a status change in the remote working space. 
This manual operation on the Internet is difficult, time-consuming, and expensive, 
due to unreliable Internet delays when an operator has to deal with several remote 
situations. Furthermore, unexpected hardware failures or environmental interactions 
during an operation can also cause the controlled-device to diverge from its expected 

R. Loganantharaj et al. (Eds.): rEA/AIE 2000, LNAI 1821, pp. 34-45, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




Java-Based Distributed Intelligent Agent Arehiteeture 35 



state. As a result, the controlled-device may then halt the operation and enter into a 
safe mode requiring additional time-consuming and expensive human intervention. 

Accordingly, to solve these problems, software architecture for a distributed 
intelligent agent that can build a safety-critical tele-operation on the Internet was 
proposed. To achieve this goal, first, the design requirements for building a tele- 
operation system for a safety critical application were defined. Then, to satisfy these 
requirements, a declarative command-processing concept for tele-inspection was 
designed to minimize the human intervention and increase the reliability of the 
software. This concept was then realized by developing an agent software for 
accepting and executing declarative commands using intelligent technologies with the 
exception of handling a mechanism for hardware failure, unexpected environmental 
interactions, and errors or inconsistencies in the command set. Finally, using Java 
technology the agent software was distributed over the Internet, particularly the 
WWW. As a case study of a safety-critical application, the development of an 
intelligent agent-based tele-inspection system for steam-generator tubes in a nuclear 
power plant was targeted. 

Section 2 of this paper introduces the SG tube inspection problems in a NPP as a 
domain description. The design considerations of tele-operation in safety critical 
applications are discussed in section 3. Section 4 presents the proposed software 
architecture along with details of each subsystem. Section 5 outlines the 
implementation and experimental analysis, with the final conclusions given in 
section 6. 



2 Description of Domain Application 



2.1 Overview of Steam-Generator Tube Inspection in Nuclear Power Plant 

A nuclear power plant has two or more steam generators. Each steam generator 
contains several thousand U-shaped tubes with diameters of one to two centimeters. 
Steam generators are used for the heat exchange between the primary coolant, and the 
secondary coolant. However, once a nuclear power plant is in operation, the tubes in 
the steam generators gradually wear out due to internal and external factors including 
vibration, corrosion, denting, wear scar, etc. In order to maintain the integrity of the 
tubes, a nuclear power plant is shutdown roughly once a year and repair work 
typically takes four to six weeks. Since a steam generator tube inspection is carried 
during this period, the inspection process must be well planned and efficient so that 
the total tube inspection time does not exceed the planned shutdown period. 

A Steam-Generator(SG) tube inspection[5] consists of three major processes: the 
maintenance planning process, the non-destructive signal acquisition process 
performed by controlling an inspection robot, and the signal evaluation process. The 
planning process is conducted to ensure the accuracy and optimize the planning of the 
inspection. Since there are thousands of tubes in each steam generator, all tubes 
cannot be tested during one inspection period. Therefore, the inspection planning 




36 Jae-Chul Moon et al. 



process needs to have some intelligent criteria by which tubes are selected for 
inspection. During the signal acquisition process, a robot arm equipped with a camera 
and signal acquisition probe is used. The robot arm locates the probe to the right 
below the target tube, then inserts the probe in the other end of the tube. When the 
probe is pulled back, the signal acquisition hardware collects the data signal from the 
probe sensor and stores it on tape cartridges or a rewritable CD-ROM. During this 
process, a local inspection robot control agent also runs in the inspection equipment 
control computer system. In contrast, the signal evaluation process is carried out 
separately using a qualified human expert who determines whether a tube has a defect 
or not according to any peculiar signal patterns appearing on the acquired signal[5]. 




2.2 Necessity of Internet-Based Tele-Inspection System 

To perform NPP steam-generator inspections, many human inspectors have to move 
from one power plant to another along with heavy inspection equipment requiring two 
or three container trunks, in addition to the inspection robot systems that are 
individually installed in each power plant. In particular, when there is an overlapping 
inspection schedule between several power plants at the same time, the logistics for 
managing this type of operation plus the equipment are quite complex. Therefore, to 
overcome this situation, a WWW-based tele-inspection center on the Internet/ Intranet 
has been proposed[5]. Intranet means a utility-owned private computer network. 

As shown in Figure 1, the inspectors are able to connect to the inspection server 
through the web server to plan the inspection schedule and evaluate the data signals. 
However, because the signal acquisition process is still carried out in the conventional 
way, the entire inspection procedure cannot be performed at the inspection center or 
where the human inspectors are located. Therefore, this prompted the development of 
a tele-inspection system of SG tubes that uses a web-based tele-operation technique. 



Java-Based Distributed Intelligent Agent Arehiteeture 37 



3 Design Requirements for Agent-Based Tele-Inspection on the 
Internet 

a. Declarative commanding: In an Internet environment, a network cannot support 
QoS(Quality of Service). As a result, the direct control of a hardware component from 
a remote site becomes more difficult. In addition, even in the case of networks that 
support QoS such as ATM, the control data has a round trip delay that disturbs the 
real-time response required to cope with a dynamically changing working 
environment. Therefore, since an open-looped, sequence-based commanding 
architecture that requires the direct control of a human operator from a remote site is 
redundant, a closed-looped, declarative-commanding architecture that can perform 
real-time control inside the software itself with some level of autonomy is preferable. 
Based on this declarative-commanding architecture, a set of primitive hardware 
control commands are abstracted in response to a declarative command. As a result, 
the network performance is increased by reducing the amount of information and 
control commands transmitted over the Internet, and the system reliability is increased 
by minimizing the communication errors that result in bad commands to the hardware 
component. 

b. Exception handling: Similar to declarative-commanding, the sending of primitive 
exceptions through the Internet can result in the miss-operation of the robot. For 
example, if the collision-expected exception occurs, the robot motion will be stopped. 
In this case, stopping the robot motion requires a real-time response to prevent a 
collision, therefore, the exception should be treated in real-time by the software itself, 
and not by a remote user. Plus, if a motor-failure exception occurs, the control server 
is unable to deal with the situation and has to send an alarm to the remote user. Even 
in this case, the software should try to stop the robot motion to minimize any damage 
to the robot. As explained above, these exceptions should be filtered and dealt with 
locally by the control server to realize a more reliable system. 



4 Design Architecture of Proposed Concept 



4.1 Overview of Proposed System Configuration 

As mentioned in section 2, the SG inspection process is divided into two major 
processes. In order to perform these processes, three separate agents were designed. 
Briefly, the inspection planning process is performed by an IPA(Inspection Planning 
Agent) then the signal acquisition through remote control of the inspection robot is 
performed by the cooperation of an IRCA(Inspection Robot Control Agent) and an 
AVA( Auxiliary Vision Agent). These agents are distributed on the Internet, however, 
the IPA and IRCA need to be able to communicate through a rigorous communication 
channel, and inter-operation between a server-side database such as an inspection 
history database is also required. In order to solve this problem, Java based distributed 
object technology [6] is adopted, called RMI(Remote Method Invocation), that allows 
the robot to operate using a web browser along with supporting inter-operation with a 




38 Jae-Chul Moon et al. 



server-side database. Another benefit of using Java RMI is that it supports a rigorous 
TCP/IP connection rather than legacy WWW techniques such as CGI [7]. An 
MBONE[8] server and client are used to display a motion image of a robot movement 
to a human inspector. As shown in Figure 2, the web server is used to transfer the IP A 
implemented as a Java applet to a client web browser. The inspection robot model 
selected was SM-10[9], which is the most popular robot in this domain and controlled 
based on IEEE488 protocol. In order to retrieve inspection history information, a 
JDBC-based database interface is used. Most of these parts are implemented using 
Java language except for the device driver for IEEE488. 

As shown in Figure 2, the tele-inspection procedure for the SG tubes in the 
proposed architecture is as follows: first, a human inspector connects to the URL 
address of the inspection web server using a normal web browser such as Netscape or 
IE. When the URL is connected, the IP A then moves to the client browser because the 
agent is implemented as a Java Applet. 




Fig. 2. Remote agent eonfiguration embedded in SG tube inspeetion server 



Now, the IPA can intelligently perform the planning task based on the particular 
knowledge base owned by the agent combined with the inspection history data that is 
retrieved from the inspection history database installed in the server-side with 
minimum intervention from the human inspector. The main mission of the IPA is to 
select the tubes that need to be tested and determine the job schedule for the selected 
tubes. Using the output of the IPA task, the IPA then connects to the IRCA to acquire 
the non-destructive testing signal of the tested tube according to the planned job 
schedule. At this time, the IPA issues a declarative control command to the IRCA to 
position the inspection robot. Since the control command is highly abstracted, the 
IRCA must translate the high-level command into low-level commands that the robot 
hardware can understand directly. In addition to this translation, the IRCA must also 
perform a reverse abstraction for unexpected exception flows or fault conditions, and 
the agent must then notify these abstracted exceptions to the IPA or human inspector. 



Java-Based Distributed Intelligent Agent Arehiteeture 39 



Because the main mission of the IRCA is the precise control of the robot position, the 
IRCA converts the abstracted command issued by the IP A, for example go to the 
(row, col) tube, into a lower level command such as the activation of the pole motor 
or arm motor attached in the inspection robot, and then supervises several real-time 
threads in order to execute the lower level command. However, because a robot is a 
mechanically operated device, there is always a calibration error in the robot 
positioning, therefore, a calibration sequence is required whenever the robot is 
moved. By independently serving the vision system of the robot, the AVA can clearly 
identify the calibration error wherever the robot is positioned by interpreting the 
image captured by the camera installed in the robot head. Thereafter, the AVA will 
automatically perform the compensation process if needed. The following sections 
explain the detailed design architecture of the tree agents. 

4.2 Inspection Planning Agent(IPA) 

The main mission of the IPA is to select the tubes that need to be inspected and 
determine the inspection sequence and inspection method because it is impossible to 
inspect all the tubes during a scheduled outage. In order to establish an optimal 
inspection plan, an analysis of the previous inspection history and the present state of 
each tube must be completed. However, because this involves a great deal of data it is 
hard to handle manually. 

The IPA is developed based on the Java based expert system concept[10] as a tool 
to minimize the SG inspection time by providing both speed and accuracy in the 
planning stage of the inspection. The IPA simultaneously determines the optimal 
inspection method, optimal inspection sequence, and tubes to be inspected as well as 
analyzing the previous inspection history. The tubes to be tested include those tubes 
whose estimated defect level exceeds a predetermined threshold or whose defect level 
is continuously increasing. Tubes neighboring a defected tube plus tubes that have not 
been inspected over the few years are also included in those tubes that need to be 
tested. The expert knowledge base provided by the human expert establishes these 
rules for the selection of the tubes to be inspected. Similar to a human planner, the 
IPA requires a variety of information such as the SG design specification, results of 
previous inspections, operating history, repair history, technical specifications, and 
regulatory guidelines. Much of this information resides in the inspection history 
database in the inspection server. Because the IPA is implemented as a Java Applet, 
an inspector can remotely access the IPA through the Internet using a web browser. 

At the command of the planner, a domain-model is built using the inspection 
history database information. In addition, an inspector can enter plant-specific 
information. The inference engine determines the kind of flaw and specific region of 
the SG by consulting a knowledge base of rules reflecting the planning expertise. The 
rules are grouped into 3 categories as shown below: 

- Rules about what kind of action should be taken when a particular flaw is 
positively identified in a region. 

- Rules about what kind of action should be taken when a particular flaw in a 
region is suspected via past inspections. 

- Rules that determine which flaws are probably based on other sources of 
information such as plant chemistry, leak rates, loose parts, vendor designs, etc. 




40 Jae-Chul Moon et al. 



The inspection sequence is decided after determining the tubes to be tested and the 
inspection method. The sequence is actually the sequence of the robot movement for a 
signal acquisition. The signal acquisition time is shortened as the robot’s movement is 
shortened. After the planning is completed, all the information is stored in the remote 
database and the IPA then establishes a connection with the IRCA via the Internet or 
the utility-owned Intranet in order to control the inspection robot according to the 
planned schedule. 

4.3 Inspection Robot Control Agent(IRCA) 

As mentioned in earlier sections, it is more desirable to support declarative commands 
than sequential commands in an Internet environment. Therefore, major declarative 
commands can be defined as follows; moving the robot arm to the target tube, 
inserting the probe into the tube, and acquiring the data signal from the probe sensor. 
These commands can not be directly commanded through the device drivers, 
consequently, they have to be decomposed into a set of low-level hardware 
commands. Thereafter, the commands can be executed and monitored to detect and 
cope with any exceptions. The agent that performs these jobs in the proposed system 
is the IRCA. Accordingly, the IRCA is designed with layered architecture including a 
reactive reasoning layer, real-time executive layer, and device driver layer, as show in 
Figure 3. First, the device driver layer interfaces between the hardware components 
and the upper layer. Second, the real-time executive layer manages the real-time 
control threads and reasoning threads. Finally, the reactive reasoning layer generates 
the behavior to achieve the goal. 



m o 



e to tube 



S X = f X (X , Y , 
S Y = fy (X , Y , 



t Y ) 
t Y ) 



T h e t a 1 , 
= I n V e 1 



T h e t a 2 
s e _ K i n e E 



a tic s ( S X , S Y ) 




ROBOT 



PROBE PROBE 

MOTOR SENSOR 



Fig. 3. Overall architecture. 



a. Device driver layer: The role of the device driver layer is to support the 
communication between the hardware components and the high layer software 
components. In the proposed architecture, the device driver is further divided into 






Java-Based Distributed Intelligent Agent Arehiteeture 4 1 



several device driver layers. The lowest level device driver layer is the IEEE488 
device driver layer that supports the IEEE 488 signaling to send and receive 
messages. Because all the hardware components are linked through the IEEE488 
interface, the other device drivers, such as the robot device driver, probe motor device 
driver, and probe sensor device driver, are all built based on the IEEE488 device 
driver. When supporting the interface with the hardware, the device driver layer 
extracts the logical data from the raw hardware data for the higher layer and then 
translates the logical command of the higher layer into a raw hardware command. 

b. Reactive reasoning layer: To explain the mechanism of this layer, the major 
declarative command go to the (row, col) tube was selected. There are two abstracted 
parts in this command. The first is the location part, like the target tube’s row 
number X and column number Y. The second is the action part, such as move, insert, 
and acquire. The row and column number in the command does not contain the real 
tube position in the SG, therefore, the row and column number must be translated to a 
real position in the SG. However, since this real position is not directly used to control 
the robot arm and pole, this real position is transformed, using inverse kinematics, 
into the number of degrees that the arm and pole motor should rotate. Then, the 
reactive reasoning module generates a set of hardware control commands using the 
results. However the set of primitive commands is still not perfect to achieve the goal. 
This means that the reactive reasoning module should also react to an error or 
unexpected condition. To perform "reacf', the environment information is 
periodically analyzed to detect any changes, and if a meaningful change is detected, a 
new reactive behavior is reasoned, then this procedure is repeated to achieve the goal. 

For the mechanism of reactive reasoning, the subsumption architecture[l 1] is 
modified. Each command is linked with a set of behaviors that consist of the 
subsumption architecture. When the command is transmitted, a set of behaviors linked 
with the command is targeted. Once targeted, the procedure of selecting the behaviors 
is the same as the original subsumption architecture. Briefly, if an external event or 
exception occurs, specific behaviors are triggered to deal with it. In order to activate 
one of the triggered behaviors, each behavior has a priority and the highest priority 
among the triggered behaviors is activated. Once activated, the behavior continues 
until it yields or a higher priority behavior arrives. 

c. Real-time executive layer: The information from the hardware components and 
the commands to the hardware components are multiplexed and scheduled in the real- 
time executive layer in order to guarantee the real-time reaction. Along with 
multiplexing and scheduling, this layer also controls the upper layer’s reasoning tasks. 
Because the reasoning tasks are only triggered when new information is received 
from the hardware components, the scheduling and controlling of the reasoning tasks 
by this layer is effective in avoiding malfunctions as the result of a dead-line miss of a 
real-time reasoning task. 

d. Exception handling mechanism: The exception handling mechanism in the IRCA 
is as important as the declarative commanding structure of the IRCA, because there 
are many exceptions. If the exceptions are not filtered or not treated inside the IRCA, 
the whole inspection process will be delayed by the exceptions. In order to minimize 
the intervention by exceptions, the IRCA includes exception-handling architecture. 

The basic role of the exception handle structure is to decide whether the exception 
can be treated by the IRCA itself or not. The IRCA defines robot movement 




42 Jae-Chul Moon et al. 



exceptions such as collision expected, exceed the tube, etc. as treatable exceptions, 
and hardware failure exceptions such as arm motor malfunction, etc. as untreatable 
exceptions. If treatable exceptions are detected, the exception is filtered by the IRCA 
itself, if untreatable exceptions are detected, the exception is sent to the user for 
notification. 

For the treatable exceptions, the exceptions are re-routed to the reactive reason 
layer for coping with these exceptions. The re-routed exception is then analyzed to 
select a new behavior in the reactive reason layer. For example, if an exceed the target 
tube exception arises, the reactive reason layer generates a motor stop command, 
reduce arm and pole motor speed to increase precision, and restart movement to target 
tube. If a user controls the inspection robot to resolve an exceed target tube exception, 
a lot of time will be wasted centering the probe in the target tube because of the delay 
and jitter on the Internet. Consequently, the handling of these exceptions by the 
control server itself is preferable in the design of a reliable system. Plus, the network 
traffic is reduced by resolving the exception inside the control server itself 



4.4 Auxiliary Vision Agent (AVA) 

Currently, a human operator will control a robot and test probe remotely in a control 
room. A camera is attached to the robot arm to confirm the right position of the probe 
while the robot is moving. In order to observe the insertion process, a CCTV monitor 
is used in the control room to display the camera image. Therefore, the human 
operator can confirm that the robot is following his command, and sometimes he can 
identify the calibration error between the logical position of the robot and its physical 
location, which is caused by the mechanical part of the robot[9]. Similarly, in the 
proposed agent-based approach, the task of identifying the calibration error and 
compensating for the error is automatically performed as the main mission of the 
AVA. 

In this paper, it is assumed that an SM-10[9], a 2-axis horizontally-jointed robot 
used in NPPs worldwide, is used. The robot positioning is performed by the IRCA 
using kinematic information and predefined tube position data. The AVA performs a 
sequence of image processing and pattern recognition activities to fine-tune the 
probe’s auxiliary position; that is, this procedure confirms the exact position of the 
probe in order to prevent any unexpected accidents such as the destruction of the 
probe. 

The task of the AVA in fine-tuning the camera image data is as follows. Figure 4 
(a) shows the real camera image and (b) the deviation from the desired center 
position. The solid circle denotes the correct position of the tube in the image and the 
dotted circle indicates the actual tube position. Axi and Ayi denote the deviation of the 
tube in the image space. Actually, these correspond to the deviation in the world- 
coordinates. These errors can be easily compensated using kinematic information and 
the current robot position then calculated using the encoder data. Since all the 
information of the working environment such as the pitches, diameter of the tube, 
number of tubes, etc., is known previously, it is assumed that Axi and Ayi do not 
exceed half the tube diameter without loss of generality. 




Java-Based Distributed Intelligent Agent Arehiteeture 43 




(a) (b) 

Fig. 4. (a) Real eamera image, (b) Deviation from tube eenter and probe position. 

In this case, the only task of the vision processing is the confirmation of the arm 
position. The AVA has two modes to fulfill this task. The first is sending the 
compensation data to the IRCA for the automatic correction of the robot position, and 
the other mode is sending data to the IPA for confirmation of the compensation data 
by a human inspector. The second mode is performed when the range of the 
compensation data exceeds the predefined threshold or the robot setup situation. 

The IRCA and AVA constantly interact with each other in order to perform all the 
processes automatically without any human intervention, thereby confirming the 
position of the probe. 



5 Implementation and Evaluation 

The IPA was implemented as a Java applet using a Java Expert System Shell[10] and 
the IRCA was also implemented using both Java and C language. The computation 
intensive part of the IRCA, such as the IEEE488 device drivers were implemented 
using C language and interfaced to the Java part using JNI(Java Native Interface). To 
interact with the IPA and IRCA, RMI(Remote Method Invocation) [6] was used, 
which is a Java-based distributed object computing concept such as CORE A. The 
Java RMI and Applet were built using JDK 1.1. An IPC(Inter Process 
Communication) system call in Windows32 was used to support the interaction 
between the IRCA and the AVA. All software components on the server-side were 
developed using a Windows NT workstation. 

Figure 6 shows the implemented user interface Applet. Part A is the camera 
image that was multicast by the MBONE[8] server on the server-side. Using part A, 
the user can check the robot movement. Part B is the applet-style GUI of the IPA for 
controlling the inspection robot. Although the IPA constantly interacts with the IRCA 
for exchanging abstracted control commands and exception information, as shown in 
Figure 5, the IPA applet has an additional GUI interface, shown in the upper level of 




44 Jae-Chul Moon et al. 



the applet, for sending very low-level robot hardware control commands. This is used 
in the case of a malfunction in the IRCA or in the robot calibration stage during the 
initial setup. In part B, the robot status map is used to send abstract commands and 
show the estimated location of the robot arm and pole in the SG. The estimated 
location is periodically updated. 




Fig. 5. GUI of proposed tele-inspection system for SG tubes in NPP. 



6 Conclusions and Further Research 

The software for building a safety-critical tele-inspection system requires the 
integration of various technologies including reactive reasoning used in the IRCA, 
deliberative reasoning used in the IP A, and complex vision algorithms used in the 
AVA. In order to effectively integrate these modules on the Internet, each module was 
encapsulated in an agent based on a declarative commanding structure. As a result, 
the inherent unreliability of using the Internet in operating an inspection robot is 
reduced, and the human operator intervention through the inspection process is 
minimized. Accordingly, it was demonstrated that intelligent agent-based software 
can be successfully deployed using the Internet even when a system requires a highly 
safety-critical environment such as a nuclear power plant SG tube inspection system. 

In this paper, the agents communicated with each other using a Java RMI 
interface. Through the use of a Java RMI interface, the implementation effort to 
support the communication among agents on the Internet was reduced. Plus, the use 
of Java technology in implementing a user interface enabled a web-based interaction 
between an operator and the agents. As a result, a user can easily access the agents 
using a familiar web browser. 









Java-Based Distributed Intelligent Agent Arehiteeture 45 



Acknowledgements 

The research described in this article was funded by EESRI(Electrical Engineering & 
Science Research Institute) and KEPCO(Korea Electronic Power Corporation) in 
Korea. 



References 

1. M, Cox, and J. Baruch, “Robotic telescopes: An interactive exhibit on the 
World-Wide Web,” Proc. 2”“* Int. Conf. World Wide Web, Chicago, Oct. 1994. 

2. E. Paulos and J. Canny. “Delivering Real Reality to the World-Wide Web via 
Telerobotics,” Proc. IEEE Int. Conf. Robotics and Automation, Minneapolis, 
Apr. 1996. 

3. K. Goldberg, M. Mascha, S. Centner, N. Rothenberg, C. Sutter, and J. Wiegley. 
“Desktop Teleoperation via the World Wide Web,” Proc. IEEE Int. Conf. 
Robotics and Automation, Nagoya, Japan, may, 1995. 

4. K. Taylor and J. Trevelyan. “Australia’s Telerobotic on the Web,” Proc. 25^^ Int. 
Industrial Robots Symp., Singapore, Oct. 1995. 

5. S. J. Kang, J. C. Moon, D. H. Choi, S. S. Choi, and H. G. Woo, “A Distributed 
and Intelligent System Approach for the Automatic Inspection of Steam- 
Generator Tubes in Nuclear Power Plants,” IEEE Trans. Nuclear Science, vol. 
45, no. 3, June 1998. 

6. Sun Microsystems, “Java Remote Method Invocation Specification beta draft,” 
Dec. 1996. 

7. D. R. T. Robinson, “ The WWW Common Gateway Interface Version 1.1,” 
Internet Draft, IETF, Feb. 1996. 

8. Hans Eriksson, "MBone: The Multicast Backbone," Communications of the 
ACM, vol.37, pp. 54-60, August 1994 

9. Zetec In., SM-10 Operation Guide System DISC Series 200/300 Edition 18.6, 
Rev.5, Dec. 1988. 

10 M. Watson, Intelligent JAVA Applications for the Internet and Intranets, 
Morgan Kaufmann,1997 

11. R. Brooks, “Intelligence without Reason,” in Proc. International Joint 
Conference on AI, Sydney, Australia, August 1991. 




The Use of AI Methods for Evaluating Condition 
Dependent Dynamic Models of Vehicle Brake Squeal 



112 3 

Simon Feraday , Chris Harris , Kihong Shin , Mike Brennan , and 
Malcolm Lindsay 

^ Image, Speech and Intelligent Systems Research Group 
Building 1, University of Southampton, Southampton. SO 17 IBJ. UK 
{ saf 97r , c j h}@ecs . soton .ac.uk 

2 

Now at: School of Mechanical Engineering, Hanyang University, Seoul, South Korea 

kshin@orgio . net 

3 

Institute of Sound and Vibration Research ,Tizard Building 
University of Southampton, Southampton. SO 17 IBJ UK. 
mj b@isvr . soton .ac.uk 

4 

TRW Braking Systems, Oldwych Lane East 
Kenilworth, Warwickshire. CVS INR, UK 
malcolm lindsay@trw.com 



Abstract. A neurofiizzy modelling technique is used to predict the differential 
equation coefficients of brake noise time histories as functions of braking test 
conditions. These are then related to the 3^^ order differential equations 
governing a candidate mathematical model of brake squeal using a second 
neurofiizzy model. This determines whether similar or sensible parametric 
changes in the model are required to mirror the dynamic effects of changes in 
experimental condition parameters. An assessment of the efficacy of the 
candidate model is then made based on this analysis. The results of different 
candidate models could be likewise compared to determine which is most 
realistic. 



1. Introduction 

Disc brake squeal is a complex and fugitive phenomena. Present technologies 
employed to aid our understanding and ability to predict it include finite element 
analysis (Liles, 1989) and double pulsed laser holography (Felske et al., 1978). Some 
good results have been obtained through finite element work but only across a limited 
range of similar operating conditions. Modelling a friction interface is a notoriously 
complex and imprecise task. Also, braking conditions of speed, temperature and 
pressure are not accounted for and experimentation shows these factors to be critical 
to squeal generation. Double pulsed laser holography provides a visual validation of 
finite element work but little clue as to how to improve brake design. Some lumped 
parameter modelling work also continues though without some experimental support 
it is difficult to reconcile this to realistic brake designs. Substantial experimentation is 
still required today to validate a brake design before it enters service. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 46-55, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Evaluating Condition Dependent Dynamie Models of Vehiele Brake Squeal 47 



Black box modelling provides a potential way forward in the search for optimal 
brake design but used crudely as an overall noise prediction tool does not aid our 
understanding of the dynamic processes involved. An automated, data driven, system 
identification type method is presented here which enables more detailed prediction of 
squeal (ie. including frequency information) and also a degree of understanding of the 
underlying process dynamics. 

The idea proposed here is based on building auto-regressive (AR) models of audio 
time series recordings of brake noise under different braking conditions (of speed, 
disc temperature and pressure). These one-step-ahead prediction models are converted 
to characteristic differential equations (DE’s) for each braking condition. The DE 
parameters are then modelled as functions of the braking conditions using a black-box 
type AI technique, neuro fuzzy modelling (Brown and Harris, 1994). The models 
constructed should give some indication of the underlying dynamic processes 
involved in the production of brake squeal as well as allowing prediction of squeal 
frequencies and volume. In order to take the analysis further, a method of validating 
candidate mathematical models against the experimental data (parametric 
reconciliation) then follows. This involves using a further neurofuzzy model to map 
from the predicted DE coefficients to parameter values in the candidate mathematical 
model. If the candidate model’s parameters reflect the experimental conditions that 
generated the original DE’s then the model has some degree of validity. Given several 
models it should be possible to make a comparative assessment of which one most 
accurately portrays the underlying dynamic processes in the experiments. 



2. Neurofuzzy Modelling 

Once characteristic AR or DE regressors are extracted from a number of time series 
recordings of disc brakings under different test conditions (of disc temperature, speed 
and brake pressure) then an intelligent modelling algorithm can be applied to 
predicting their values as a function of those conditions. Here neurofuzzy networks 
are used to model input/output data transformations trained using the Adaptive Spline 
Modelling of Observation Data (ASMOD) algorithm (Kavli,1992). This technology 
attempts to construct parsimonious (as simple as possible whilst still accurate) fuzzy 
models of unknown functions or processes based on observed data inputs and outputs. 

Fuzzy logic was introduced by Zadeh (Zadeh,1965) as a way of producing an 
interpolated mapping between simple, linguistically defined transformations (rules). 
Each input axis is split into a number of membership groups. Any point on an input 
axis is a partial member of several groups, the group memberships summing to 1 in 
accordance with the B-Spline membership functions derived by (Cox, 1972) & 
(DeBoor,1972). Each group has a weight and the system output at any input state is 
the sum of the group memberships at that state multiplied by the corresponding 
weights. 

With multi-dimensional models each input axis has a set of membership groups 
and each permutation of groups (with one from each axis) has a weight. The multi- 
dimensional group memberships are the product of the group memberships from each 
axis. Thus, if in a model with 3 inputs, the system is a partial member of 2 groups on 




48 Simon Feraday et al. 



each input axis then the overall output is the sum of 2^ =8 group memberships 
multiplied by 8 corresponding weights. This leads to a phenomenon termed ‘the curse 
of dimensionality’ whereby the number of model weights grows exponentially with 
the dimensionality of a model making modelling of high dimensional processes 
impractical. This problem can be alleviated by dividing up a high dimensional model 
into several lower dimensional sub-models the outputs from which can be summed to 
give a total output (as shown diagrammatically in figure 1). NB. This can only be 
done if there is no cross-coupling between the effects of inputs assigned to different 
sub-models. The structure of the model in terms of the number and dimensionality of 
sub-models used and which of the available input parameters are chosen for which 
sub-models is tackled by Kavli’s ASMOD algorithm (Kavli,1992). 




Fig. 1. Additive fiizzy sub-models 



3. Parametric Reconciliation Architecture 




Fig. 2. Parametrie reeoneiliation arehiteeture 



The fundamental system architecture used for this work is shown in figure 2. This 
consists of two trained neuro fuzzy networks through which experimental condition 
queries pass to be converted to equivalent parameters in a candidate mathematical 












Evaluating Condition Dependent Dynamie Models of Vehiele Brake Squeal 49 



model. This is used to evaluate the efficacy of the candidate model by virtue that if 
experimental parametric changes translate to similar changes in equivalent parameters 
in the candidate model (raised temperature etc.) then this provides a good basis on 
which to validate the candidate model. 

The technique uses two neurofuzzy networks. The first (on the left of figure 2) is 
the mapping from experimental conditions to DE coefficients for the system output 
under those conditions (to be explained in section 3.1) Depending on the quality of 
experimental data available, this model usually contains most of the overall mapping 
error since the experimental coefficients are typically stochastic. The second 
neurofuzzy model, on the right of figure 2, is particular to the candidate model being 
evaluated. It inverts the candidate model mapping from output DE coefficients to 
model parameters required to produce such output. Although isolation of appropriate 
parameter values is a tedious optimisation task, an accurate neurofuzzy model of this 
function can usually be produced subject to the uniqueness of the inverse. Production 
of such a model is discussed further in section 3.2. 

3.1 Mappings of Experimental Conditions to Output DE Coefficients 

Data acquisition The experimental data for this work were recorded from a disc 
brake on a test rig across a matrix of different braking conditions. Stop tests only were 
used whereby the disc speed reduces under the influence of the applied brake as 
opposed to it being driven at constant speed against the brake. Recordings were 
triggered as brake speed fell below 200RPM and continued for 5 seconds at a 
sampling rate of 62.5KHz. Work done by (Shin et al.l999) suggests the high 
sampling rate is worthwhile. 

162 different braking stops were recorded each under different conditions of 
applied pressure and disc temperature (starting speed always being the same). 
Average friction level, |Li, was also recorded for each stop as a ‘passive’ condition ie. 
measured but not set. Disc temperature is pre-set for each stop by pre-conditioning 
brakings. The data for each braking stop were recorded and then processed into AR 
and DE coefficients using a PC based maths package. The (3^^ order) DE coefficients 
of the squeal output from each stop were then modelled as functions of the 
corresponding stop test conditions using an implementation of the ASMOD 
algorithm. 

AR & DE model elicitation An AR model is simply a one-step-ahead prediction of a 
time series x(t) based on a vector multiple with past values ie. 

x{t) + ax{t - 1) + bx{t - 2) + cx{t - 3) = 0 (1) 

Based on the assumption of a stationary signal (not too stringent an assumption 
since experiment shows brake squeal does not tend to alter in frequency greatly in one 
stop) this reduces a time series of any length down to a fixed number of parameters 
which can be meaningfully compared to those from other time series 

There are a number of ways of calculating AR coefficients of a data series and 
indeed there are no ‘correct’ values for AR coefficients. The method used here is the 
simple Moore-Penrose pseudo-inverse of the Hankel (delay) matrix of the time series. 




50 S imon F er aday et al . 



This evaluates the regressors with least-squares error and is preferred here both for its 
simplicity and that, unlike PCA, for instance, the average stability of the system poles 
of models produced generally reduces with increasing signal/noise ratio which allows 
the models to be used for squeal volume prediction. 

The model order (number of regressors) should be chosen with care in order to 
both accurately reflect the experimental data and preserve the uniqueness of the 
mappings. Methods exist for calculating an ‘optimum’ number of regressors from the 
perspective of representing the time series most notably by use of singular value 
decomposition eg. (Shin et al.,1999). The order is constrained to equal that of 
whatever candidate models are proposed which requires that these be carefully 
chosen. The simplest satisfactory situation, and that employed here, is to have model 
order equal to both the number of parameters controlling the experimental data and 
the number of free parameters controlling the candidate model 

The system poles (polynomial roots) of the AR model yield the underlying 
frequencies and stabilities of stationary components in the signal. They are also used 
to convert the AR model to a DE for easier compatibility with the candidate model. 
Conversion between the two representations is achieved simply by taking logs of the 
roots and re-expanding them (thus converting from z-plane to s-plane roots). 

Once characteristic DE’s are constructed of the brake noise recorded under each of 
the 162 available conditions a neuro fuzzy model is trained using the ASMOD 
algorithm (Kavli,1992) to predict those DE coefficients as functions of braking 
conditions. 

Results on experimental data This section presents graphically the neurofuzzy 
models of the 3^^ order DE parameters elicited from recordings of a Nissan brake 
under test on a rig in accordance with the conditions below. 



Table 1. Data capture conditions 



Date recorded: 


4^^ August 1999 


No. of squeal recordings: 


162 (all stop tests) 


Disc radius: 


296mm 


Trigger speed: 


200 RPM 


Condition parameters 


1 . Brake pressure (1 .9-35.2 bar) 


recorded: 


2. Average stop q (0.43-0.76) 




3. Disc temperature (58-264C) 


Model statistical 


Minimum description length 


significance measure: 


(Rissanen,1978) 



Amongst the 3 available condition parameters, the ASMOD algorithm selected 
only disc temperature as statistically significant to the prediction of the output DE 
coefficients. The 162 data rows of conditions (inputs) v. DE coefficients (outputs) 
produce the following graphically defined neurofuzzy models of DE para 
meters A,B & C (as per equation 2, below) with respect to temperature when 
processed as per table 1 : 








Evaluating Condition Dependent Dynamie Models of Vehiele Brake Squeal 5 1 



d^x 



, d^x 



... .V ^dx _ 

, +A — - + B — + C: 
dt dt dt 



( 2 ) 



Table 2. Fuzzy models of experimental DE parameters v. dise temperature with eorresponding 
target/output plots 




The models (in the left column) show the albeit crude fuzzy models predicting the 
DE parameters A,B and C as functions of experimental disc temperature (58-264 ® C). 
Their corresponding target/output graphs in the right hand column show the actual 
values of A,B or C in the training data (targets) vs. model predictions (outputs). These 
graphs, which would ideally show perfect 1 : 1 correlation between target and output, 
in fact show the models to be far from perfect nonetheless some underlying structure 
has been captured. 

The neurofuzzy models of A,B and C are then used to generate the primary system 
stability (real part of most unstable oscillatory system pole) and frequency (from the 
imaginary part of the same root) as functions of temperature. These functions are 
plotted in figures 3 and 4. 

The model defined in table 2 thus predicts increasing stability of the primary root 
(lower squeal volume) and lower primary squeal frequency as temperature increases 
throughout the experimental range. 





52 S imon F er aday et al . 





Fig. 3. Primary stability of 3^^ order Fig. 4. Primary frequency (in Hz) of 3rd order 

neurofiizzy DE model as a function of neurofiizzy DE model as a function of 

temperature temperature 



3.2 Mappings of DE Coefficients to Candidate Model Parameters 

Candidate model The neurofiizzy network on the right of figure 2 maps from output 
DE coefficients to parametric values in a candidate model. A number of obstacles 
must be overcome to facilitate this namely optimisation of the internal model 
parameters such that the range of experimental DE’s can be reproduced by reasonable 
values of the model coefficients of interest. Firstly, however, a candidate model from 
which output DE’s can be produced is selected. The only constraint is that the 
model’s dynamics be of the same order as the experimental DE’s extracted. The 3"^^ 
order model used here to demonstrate this technique is shown in figure 5, below. 




0 200 Temp 



Fig. 5. Temperature dependent [l /v instability model 

The self-excited vibration in this model is triggered by a decrease in the coefficient 
of friction with respect to increases in the relative speed of the disc and pad 
according to: 







Evaluating Condition Dependent Dynamie Models of Vehiele Brake Squeal 53 



ti = |^o -a(v-x) 



Temperature affeets the spring rates and damper value aeeording to: 



K 



= 1.8763*10' * 1- 



Q.3r ^ 

] 



(3) 

(4) 



=2.8389*10’ 




(5) 



c = 49538 * 




( 6 ) 



as indicated in figure 5. The spring rates and damping coefficient thus decay 
linearly with temperature,!, in degrees Celsius. Ignoring the temperature effect, the 
candidate model is governed by the following 3"^^ order differential equation: 

1^3 I [mk^-Pac] ^^2 ^ {ck^+ck^-Pak^) ^ ( 7 ) 

me me me 

Where the roots of X are the system poles. The eoeffieients A,B and C (as per 
equation (2)) are simple to ealeulate from the model eoeffieients m,p,aand T, 

knowing the dependeneies of kj , k 2 and e on T (equations ( 4 )-( 6 )). Were there more 
model parameters the inverse mapping would be non-unique. In the ease here, 
however, p and a are always eoupled whieh reduees the eandidate model parameter 
spaee to 3 dimensions, the same as the DE order. The temperature relationships in 
equations ( 4 )-( 6 ) have been optimised sueh that the experimental DE’s are reprodueed 
by the model somewhere within the hypereube in model parameter spaee defined in 
table 3, below whieh eorrespond to realistie values of the model parameters of mass, 
pressure, temperature and a 

Table 3. Operational hypercube in model parameter space 



M 


0.2715-0.4271 kg 


T 


56-201 ” C 


pa 


451-1791 N/m^ 



Parametric inversion results If random parameter values are ehosen within the 
eonstraints of table 3, the eorresponding DE eoeffieients (as per equations (4)-(7)) ean 
be extraeted and used to generate a data set for the ‘DE ^ model parameter’ 
neurofuzzy model in question. The neurofuzzy model is trained to model the funetion 
inverse by applying the DE eoeffieients as training inputs and the model parameters 
used to generate them (m,T, pa ) as training targets. Although the neurofuzzy models 
produeed for sueh inverses are often quite eomplex they are generally of good 
aeeuraey sinee the model data is entirely deterministie (see the target/output graphs 
for the extraeted models in table 4, below) 





54 Simon Feraday et al. 



Table 4. Parametric inverse model target/output plots 




4. Model Evaluation 

If query data are now passed through both neurofuzzy networks as per the arehiteeture 
of figure 2 it is hoped that the eandidate model parameters resulting will refleet the 
querying experimental input parameters thus validating the eandidate dynamie model. 
As the first (experimental eondition ^ DE) network only uses the dise temperature 
input, a series of queries are put through the networks at stepped values aeross the 
range of the first network’s input spaee (58-264 ® C). The resulting values of the 
dynamie model’s parameters required to mirror the effeet of the experimental 
temperature funetion in DE spaee are shown in the table below. 



Table 5. Model parametric changes to mirror the effects of experimental temperature 




In a perfeet model, temperature (middle graph) would translate to the same value 
as experimental temperature whilst the other 2 parameters (m and pa ) do not ehange 
with respeet to ehanging experimental temperature. The mapping with respeet to the 
model temperature funetion is quite eonvineing although there also have to be 
ehanges in m and pa to simulate the preeise effeets of experimental temperature 
ehange. Overall the model ean be validated as suffieient for basie purposes 
partieularly in the region of 150-200 degrees Celsius where ehanges in the pa 
parameter are small. For more detailed and wider ranging predietion work, however, a 
more aeeurate (and probably eomplex) model would be required. 



Evaluating Condition Dependent Dynamie Models of Vehiele Brake Squeal 55 



5. Conclusions 

An AI based technique is presented for ‘parametric reconciliation’ between a complex 
real-world phenomenon (vehicle brake squeal) and a (simple) 3^^ order candidate 
mathematical model of the process. In so doing this technique allows evaluation of 
the candidate model in terms of how realistically the model parameters mirror the 
effects of their real-world counterparts under test conditions 

The model presented here requires significant changes in all 3 of its parameters to 
mirror the effects of temperature changes in the recorded experimental data although 
the state locus of the model’s temperature parameter closely reflects that of the 
experimental temperature function. The model presented is deemed acceptable for 
work within a region of the temperature range (approx. 150-200 Celsius) where the 
changes required to the other parameters are not too prohibitive. 

A number of candidate models would typically be evaluated in this way. Each 
would require an inverting network to be produced for it (as per section 3.2). The 
candidate model with parameters best matching their equivalents in the experimental 
data would then be selected as the most representative of the underlying experimental 
dynamics and the most suitable for experimental prediction. 

If none of the candidate models tested prove sufficiently accurate, the technique 
may still give clues as to how to adapt a candidate model to the experimental data (for 
instance by mapping from experimental temperature directly to k and c in the example 
given here producing ideal k(T) and c(T) functions). The work is, of course, not 
limited to vehicle brake squeal and can be used to validate dynamic models of any 
unseen process based on empirical data from that process. 

References 

1. Brown M. and Harris C.J (1994) Neurofuzzy adaptive modelling and control. 
Prentice Hall, Hemel Hempstead, UK 1994 

2. Cox M. (1972) The numerical evaluation of B-splines. Jnl. Inst. Math. Appl. 
Vol 10, pp 134-149. 1972 

3. DeBoor C (1972) On calculating with B-Splines J.Approx. Theory. Vol 6. Pp 
50-62, 1972 

4. Felske A, Hoppe G and Matthai, H (1978) Oscillations in squealing disc 
brakes - analysis of vibration modes by holographic interferometry. SAE Paper 
780333 

5. Kavli T (1992) ASMOD - an algorithm for Adaptive Spline Modelling of 
Observation Data. Int. Jnl. Of Control, vol 58, pp 947-967, 1992 

6. Liles G (1989) Analysis of disc brake squeal using finite element methods. SAE 
paper No. 891150, 198 

7. Rissanen J (1978) Modelling by shortest data description. Automatica, Vol. 14, 
pp 465-471, 1978 

8. Shin K, Feraday S, Harris C.J. Brennan M (1999) Optimal auto-regressive 
modelling of a measured noisy time series using SVD. Proc. Intemoise’99. 
Conference, Florida, USA . Dec 1999 

9. Zadeh L (1965) Fuzzy Sets Jnl. Information and Control, vol 8, pp 338-353, 
1965 




Towards an Estimation Aid for Nuclear 
Power Plant Refuelling Operations 



J.A. Steele^*, L.A. Martin^ A. Moyes\ S.D.J. McArthur^ J.R. McDonald^ 
D. Young^, R. Elrick^, D. Howie^, and LY. Yule"^ 

^ Centre for Eleetrieal Power Engineering, Department of Eleetronie & Eleetrieal 
Engineering, Royal College, University of Strathelyde, Glasgow, UK. G1 IXW 
^ Eleetrieal & Control Design Seetion, British Energy Generation Ltd, 
Redwood Creseent, Peel Park, East Kilbride, UK. G74 5PR 
^ British Energy Generation Ltd, Barnett Way, Barnwood, Gloueester, UK. GL4 3RS 
British Energy Ltd, Torness Power Station, Dunbar, East Lothian, EH42 IQS 
* Corresponding author: j . a . steele@strath .ac.uk 

Abstract: Analysis of monitored refuelling data is required to eonfirm that 
refuelling operations have been eorreetly eompleted and that the reaetor plant is 
in a safe eondition for eontinued operation. This paper deseribes a methodology 
for identifying key points in the refuelling proeess thereby providing deeision 
support for post-refiielling analysis. A feature identifieation teehnique is 
deseribed whieh provides reliable input to Artifieial Neural Networks (ANNs) 
and regression estimation teehniques. This teehnique is shown to be robust 
against variations in the input data. The analysis in this paper shows that the 
regression models and ANNs ean also provide similarly aeeurate predietions of 
a key refuelling event. 



1 Introduction 

Contained within the nuclear power plant under study are two reactors containing 
over 300 nuclear fuel assemblies each. After several years the reactivity of this fuel 
decreases and refuelling is required, which involves the removal of the used fuel and 
insertion of new fuel. This is performed by a fuelling machine which contains 
mechanisms for the removal and insertion of fuel, as well as the recording of load and 
height data. The load data refers to the load which the fuel exerts on the machine as it 
is lowered into the reactor and the height data signifies the distance through which the 
fuel has been lowered. 

It is the function of a fuel route specialist to monitor these data values during 
the insertion of the new fuel, to ensure that it is correctly set down within the reactor 
core. It is therefore helpful for the operator to be aware of the expected point at which 
touchdown of the fuel will occur. This paper presents the first stage of work in the 
creation of a decision support system (DSS) to support the operator in this monitoring 
operation, as shown in Figure 1. A methodology is developed that performs analysis 
of the load data and subsequently makes a prediction of the point at which fuel 
touchdown will occur. 

The DSS analyses the data within the last 50cm before touchdown. This 
region was selected because it is the primary area for operator analysis. The data is 
derived from the fuelling machine. Feature identification is then applied to this load 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 56-67, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 57 



data, and when a certain feature is identified within the data, the prediction of time 
until touchdown is made. This paper is concerned with the processes of feature 
identification and touchdown prediction. 

Artificial Neural Networks (ANNs) and Linear Regression are investigated as 
potential techniques for touchdown estimation. Given the requirement for reliable 
operation within a nuclear power station environment, an approach to dealing with 
noise and variations in fuelling machine data will be utilised. The use of this technique 
reduces the scope for errors in future identification and hence subsequent potential 
errors in touchdown prediction. 




Fig 1. The decision support system 



1.1 Refuelling Load Trace 

Equipment on the fuelling machine monitors the load exerted by the fuel assembly on 
the grab mechanism. A plot of the load and height of the grab against time is provided 
in Figure 2. The height of the grab is measured from the top of the refuelling machine. 




In Figure 2, at time=0, the fuelling machine is located above the appropriate reactor 
channel and the grab attaches to the used fuel assembly, still located within the 
reactor. During the first 3 minutes, the grab progressively lifts the fuel. The load on 
the grab increases in a stepwise manner throughout this 3 minute period and this 
corresponds to the loads of the different sections of the fuel assembly transferring their 










58 J. A. Steele et al. 



load onto the grab. This occurs because the fuel assembly is not one solid module, but 
is a number of separate components each coupled together via a tie bar. After 30 
minutes the spent fuel assembly is fully removed, having now been transferred to the 
fuelling machine, and the new fuel assembly is then placed into the channel, with the 
entire refuelling process lasting approximately one hour. 

1.2 Touchdown 

A typical trace of the grab load during the 50cm region prior to fuel touchdown is 
indicated in Figure 3. Fuel touchdown is the point at which the bottom of the fuel 
assembly ‘touches’ the fuel assembly support within the reactor. Figure 3 corresponds 
to an enlarged version of the last 50cm section of the load, as shown in Figure 2. 

The distinct peaks and troughs of Figure 3 are caused by sections of the fuel 
assembly passing through the distinct features of the fuel channel. The coolant gas 
circulating throughout the reactor also affects the apparent weight of the fuel assembly 
as it is passing through the fuel channel. 




Fig 3. Load on the grab for 50cm prior to fiiel assembly touchdown 

It is the aim of this research to predict the time at which fuel touchdown occurs, 
using information from the peaks and troughs of the signal produced 50cm prior to 
fuel touchdown, as shown in Figure 3. 

1.3 Data Available 

Since 1993, the Fuelling Machine Diagnostic & Monitoring (DMS) computer system 
at Tomess Nuclear Power Station (UK) has been storing the load and height data 
resulting from the refuelling process (hundreds of cases). Each case study stores the 
time (which is sampled at 5Hz), height and load on the grab. With each case study 
lasting for over an hour the resulting database was approximately 500Mb in size. With 





Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 59 



this amount of data, knowledge discovery in databases (KDD)[1] can be employed, or 
a certain subsection of each case study selected for study. 

With the focus of the research being the prediction of the point at which the 
new fuel assembly touches down, all of the data except for the last 50cm was removed 
from the case studies. 

2 Analysis of Touchdown Data 

The trace in Figure 3 shows the typical load variations that would be expected before 
touchdown. The main characteristic in the trace is the occurrence of a peak in the 
load, Lpk, which is preceded by a trough (trough 1 ) and followed by 2 subsequent 
troughs (troughs 2 &3). Therefore, since this characteristic occurs in most cases, 
identification of these key features can be used as indicators that the fuel assembly is 
approaching the time of touchdown. Furthermore, information about these key 
features can be used to predict the point of fuel touchdown. 

These key characteristics (Lpk, Troughs 1, 2 and 3) all usually occur in each 
case study where Lp^ normally corresponds to the maximum value of load. However, 
as shown in Figure 4, some of the case studies analysed are more sensitive to load 
variations than others, and consequently Lpk does not occur at the maximum load 
value, as it does in the characteristic load plot in Figure 3. As was mentioned 
previously, it is important in this application to ensure that the same analysis provides 
reliable output for all data, and is not dependent on the sensitivity of the input data. It 
is therefore necessary to identify the main features of the signal before touchdown. 
Furthermore, the identification of the load peak of interest (L interest)? which has a 
trough before and after it, will be analysed in this section. 

Therefore a feature extraction method was used to represent the signal in 
cases where the load signal was represented according to the position of the trough 
shapes. Once a candidate trough was identified it was then evaluated against the 
relative positions of the surrounding peaks and troughs to establish if this is a trough 
of interest. The process of identifying the troughs was achieved through a first order 
differencing of the load data, i.e. subtracting the previous load value from the current 
load value, and then by using a moving sum window i.e. adding a set number of 
differencing values, as illustrated in Figure 5. 

The original load data L is differenced to produce the set Ldiff. This is then 
summed over a set window size to produce the set Lgum- 
Thus, 

Li={Li,L2,L3,L4,...,Ln}, ( 1 ) 

Ldiff = { 0 , (Li - L2) , (L2 - L3) , (L3 - L4) , ... , (Wi - K ) }, (2) 

and 

Lsum = ( 0 , 0 , (Li - L 4 ) , (L2 - L 5 ), (L 3 - Ls), . . ., (Ln -3 " Ln) }, (3) 

where n is the number of original data points and the window size is chosen to be of 
order 3, as discussed in the following section. 

The moving sum method was chosen instead of only analysing the load 
change, so that any temporary deviations could be smoothed over. The next stage was 
to determine the window size of the moving sum. From analysis it was found that a 




60 J.A. Steele et al. 



window size of 2 samples still had significant deviations present in the signal, whereas 
a window of 4 overly smoothed the signal making identification of the trough difficult, 
therefore a window size of 3 was chosen. 




Fig 4. Examples of unusual ease studies 

The next stage was to identify the candidate troughs and peaks, to establish which 
corresponded to those of interest in Figure 4, (Linterest? Troughs 1, 2 and 3). This was 
achieved by analysing only those troughs where the sum of the moving average had 
fallen below a certain threshold. If the threshold was low, then too many troughs 
would be evaluated, whereas, if it was high, too few would be evaluated with the 
possibility of missing the trough of interest. Experimentation has shown that a setting 
of -20kg for the load variations provided an acceptable threshold. 



2.1 Representing the Load Signal Using the Trough Data 

When the candidate sets of troughs in the load data were identified, the loads, heights 
and times of the start, middle and end points were stored along with other relevant 
distance measurements. This is shown in Figure 6 below. 

The differences between each point, using distances and angles, were then 
calculated and stored. Furthermore, the moving sum for the decrease and rise in the 
trough was also stored. The distance between the start of the trough and the previous 
’load peak’ was also recorded. Therefore, each trough was represented by over 30 
measurements. Each case study produced a number of these troughs, identified in the 
last 50cm of the load signal prior to touchdown. 





Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 61 



2.2 Identifying Ti„terest 

The case studies were randomly separated into 50% training and 50% testing groups. 
The troughs identified in the training and testing data were then manually identified as 
being: 

• Tb - Trough before Linterest- i*e. Trough 1 in Figure 3 & 4, 

• Tp - Trough being identified, i.e. Trough 2 in Figure 3 & 4, and 

• Ta - Trough after Linterest- i-G- Trough 3 in Figure 3. 

The aim of this section is to identify Tp because this feature always appears in the case 
studies examined, and can be used to make a prediction of the time until touchdown. 
This is not possible for Ta and Tb because it has been found from inspection that they 
do not always appear in the case studies that exhibited unusual load traces. 




The relationship between these troughs and their corresponding representations (as 
shown in Figure 6) must then be determined. Classification techniques were employed 
in finding these relationships, with the C4.5 rule induction technique[2] chosen due to 
its capability of explaining the classification through rules, and because it has been 
successfully applied in a wide variety of areas [3]. 

The training data was used to develop a ruleset to classify the trough type by 
evaluating the trough representations. The section of the resulting ruleset that predicts 





62 J.A. Steele et al. 



if a trough corresponds to trough Tp is shown below by rules 1 through to 3, with 
reference to Figure 3. 

The first figure in the brackets is the number of records satisfying the rule, 



and the second is the confidence of the classification to these records. Applying the 
overall rule to the training and testing data gives the results shown in the confusion 
matrices shown in tables 1(a) and 1(b). 



Previous Peak: 
Time, Height&Load 



■ ^ Trou^ Start: 

Time, Height & Load 

Ditferencein 
Time, Height & Load 

Angle & Length of 
difference in Hei^t & Load 



Trough End: 
Time, Height & Load 



Angle & Len^h of 
difference in Height & Load 




Difference in 
Time, Height & Load 



Difference in 
Time, Height &Load 



Angle & Length of 
difference in Height & Load 



Trough Middle: 
Time, Height&Load 



Fig 6. Representation of a trough 

If Time_of_start_f rom_previous_peak <= 0.8 s (Rulel) 

AND Height_at_start <= 2789 cm 

AND Dif f erence_between_end_and_start_time <= 5 s 
AND Dif f erence_between_end_and_start_load <= -20 Kg 
THEN trough is ( 146, 0.98 ) 

If Middle_load >2684 Kg 
(Rule2 ) 

AND Dif f erence_between_end_and_start_load <= -19 Kg 
THEN trough is ( 171, 0.97 ) 

If Time_of_start_f rom_previous_peak <= 1.8 s (Rule3) 

AND Load_drop_f rom_previous_peak > -2 Kg 
AND Dif f erence_between_middle and_start_time > 1 s 
AND Dif f erence_between_middle and_start_time <= 1.8 s 
AND Dif f erence_between_middle and_start_load <= -18 Kg 
AND Dif f erence_between_end_and_middle_time <= 35 Kg 
AND Dif f erence_between_end_and_start_load <= 8 s 
THEN trough is ( 121, 0.951 ) 

The confusion matrix illustrates how the rule predictions, in the columns, matches the 
actual data, in the rows. The training data had 180 Tb, 213 Tp and 196 Ta records. The 
testing data had 189 Tb, 183 Tp and 200 Ta records. 





Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 63 



The tables show how the rule classifies the actual data from the training and 
testing groups. For the training set all troughs Tp were correctly classified, but 5 of 
the Ta troughs were labeled as being Tp. With the testing data 96% (176 of the 183) of 
the troughs Tp were correctly labeled, but 16 (8%) of the Ta troughs were labeled as 
being Tp. 

In the online system, once a trough has been identified, the above C4.5 rule 
can be applied to classify the present trough with the aim of evaluating whether it is 
before, during or after the peak of interest L interest. 



Table 1(a). Comparison of rule with training Table 1(b). Comparison of rule with 
data. testing data 







Predicted 


Trough 






Predieted 


Trough 






Training 

data 


Tb 


T 


Ta 


Testing 

data 


Tb 


Tp 


Ta 


Actual 


Tb 


172 


0 


8 


Tb 


183 


0 


6 


Trough 


To 


o' 


213 


0 


Tp 


1 


176 


6 




Ta 


2 


5 


189 


Ta 


19 


16 


165 



2.3 Predicting Touchdown Point 

The Tb, Tp, and Ta records were joined together to form one record with 
approximately 90 fields. Added to this was the touchdown load, height and data value. 
Furthermore the distance from Tb, Tp, and Tato touchdown were added. The available 
fuelling data was separated into 50% training and 50% testing. Since analysis of the 
trough data showed that Tb and Ta were not always present, the prediction to 
touchdown can only be achieved from Tp. 

Correlation analysis [4], which evaluates the association between variables, 
showed that the touchdown time is strongly correlated to the period of time from the 
load of interest Ljnterest? and the starting time of trough Tp 

Techniques to reduce the number of fields required as inputs for a model 
(dimensionality reduction) are now discussed, with the aim of using a subset of 
the 106 fields available. The decision tree method (DTM)[5] was used to select the 
relevant fields from those presented. This method develops a decision tree and selects 
those fields appearing within it as being those that provide the most information from 
the original data. DTM has been successfully applied to natural language 
processing[6]. By applying DTM to the training data 30 fields were identified. 

Techniques for Touchdown Analysis 

ANNs and multiple linear regression were used to predict the time until touchdown 
from Linterest. Thc ANN was chosen because it has been successfully applied to many 
different prediction and modelling areas within the nuclear industry[7] and because of 
its capability in modelling complex systems. Linear regression was used to evaluate 
the accuracy of a simple method being applied to the problem. Alternative methods of 
evaluation could include knowledge based systems[8], however there are no 
qualitative explanations for the phenomena in this analysis therefore no expertise was 
available for evaluating this section of the load trace. A further technique that could be 





64 J.A. Steele et al. 



used is model based reasoning[9] since the physical relationships prior to touchdown 
are known. However no such model exists and it is felt that creation of such a model 
would prove beneficial following further research. 

The aim therefore was to establish which of these two techniques (ANNs and 
regression) provided the most accurate results of touchdown prediction. 

Neural Networks for Predicting Touchdown 

If all the original 106 fields were used as inputs to an ANN this would have required 
considerably more refuelling data than was available in the training group, in order to 
develop an ANN. The number of training patterns P for a neural network should equal 
the number of weights in the network multiplied by the noise in the data[10], as shown 
in equation 4 below, 

P = N((I+1)J + (J+1)K) (4) 

Where N is proportional to the noise, I is equal to the number of inputs, J is equal to 
the number of hidden units, and K is equal to the number of output units. For precise 
laboratory data N would equal 1, and this would increase for general scientific 
measurements. For this analysis N is greater than one. 



Height diff 
Time diff 2 
Height len 
Time diff 3 
Load drop 1 
Rise sum 
Load drop 2 
Time diff 4 
Load 



Input 

layer 



Hidden 
layer 1 



Hidden 
layer 2 



Output 

layer 




The multilayer perceptron (MLP)[11] in Figure 7 was developed to predict 
the time until touchdown, using a back-propagation algorithm. Through 
experimentation, the structure of the network was set to 9:2:2: 1 with the momentum 

constant, OC, set to 0.9 and the initial learning rate, T|, to 0.3. Furthermore, the 
activation function within the neurons was set to the sigmoid function. Of the 30 fields 
initially selected as inputs to the ANN, only 9 were found to be significant (Table 2) 
after sensitivity analysis was performed. 




Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 65 



Table 2. Inputs to the neural network 



Inputs 


Description 


Height diff 


The height differenee between the start and the middle of Tp 


Time diff 1 


The time between the start and middle of Tp 


Height len 


The length of the line eonneeting the start and middle heights of T^ 


Time diff 2 


The time between the start and end of Tp 


Load drop 1 


The drop in load between the start and end of T^ 


Rise sum 


The load value of the moving sum between the middle and end of T^ 


Load drop 2 


The drop in load between the start and end of Tp 


Time diff 3 


The time between the middle and end of T^ 


Load 


The load at the start of T^ 



When the ANN was applied to the training data the root mean squared (RMS) average 
error was 2.29% with a standard deviation of 2.05%. On the testing data the RMS 
average error was 2.53% with a standard deviation of 2.13%. 



Multiple Linear Regression for Predicting Touchdown 

If the same 9 fields identified by sensitivity analysis for ANN are used in a multiple 
regression model, then on the training data, similar results to those of the ANN are 
obtained, with an RMS average error of 2.29% and standard deviation of 2.08%. With 
the testing data the RMS average error is 2.59% and standard deviation of 2.31%. 
The coefficients of the fields in the regression model are given in Table 3 below. 



Table 3. Coefficients for the regression model 



Field 


Coefficie 

nt 


Field 


Coefficient 


Height diff 


0.0045 


Rise sum 


-0.00021 


Time diff 1 


0.239 


Load drop 

2 


0.00328 


Height len 


0.0149 


Time diff 3 


-0.0435 


Time diff 2 


-0.129 


Load 


0.0003195 


Load drop 1 


-0.00104 


Constant 


19.53 



With the regression model providing results that are comparable to the ANN this 
model will be employed because firstly is it computationally simpler to evaluate, and, 
secondly the influence of each variable on the results can be determined more readily. 

3 Conclusions 

This paper has described a methodology for accurately predicting when new fuel 
assemblies will touchdown within the reactor during refuelling. This involves 
identifying trough features in the load trace as the new fuel assembly approaches 
touchdown. As each new trough is identified, it is evaluated against a rule to 
determine whether it is the trough from which the prediction of time until touchdown 
can be made. When the correct trough has been identified, the regression model 
described can then predict when touchdown will occur. This model has been applied 
to a number of cases with different data input sensitivities and results indicate that 





66 J.A. Steele et al. 



they can manage variations in the data through differencing and applying thresholds. 
This illustrates that the approach is suitable for all potential data inputs, indicating that 
it could be reliably used to predict fuel touchdown during refuelling. 

The methodology described can be enhanced by having several such 
techniques analysing the load signal and making predictions until touchdown. This 
approach would enhance the safety aspect of the decision support system by allowing 
comparison between results produced by diverse techniques, thereby not relying on 
one technique to make an accurate prediction in every instance. 

Further work will include the evaluation of additional potential analysis 
techniques. For example piece-wise linear regression[12] could be applied to the load 
signal as another approach to representing the signal, then the relevant peak of 
interest Linterest could be identified and a prediction to touchdown made. Instead of 
identifying troughs in the load signal the data could be used as inputs to an ANN that 
would then make the prediction to touchdown. Principal component analysis[13] 
instead of DTM could be used to reduce the dimensionality of the problem. Case- 
based reasoning[14] could be applied to evaluate the present representation of the load 
signal and attempt to match it to a previous refuelling scenario. Furthermore this 
approach could use refuelling case studies from the particular fuel channel being 
considered to help target the prediction. 

References 

1. U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth & R. Uthurusamy, 

Advances in Knowledge Discovery and Data Mining, AAAI Press / MIT 
Press, Cambridge, Massachusetts (1996) 

2. J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann, 
1993. 

3. P. Langley & H. A. Simpson, "Applications of Machine Learning and Rule 
Induction", Communications of the ACM, Vol 38, No 1 1 (1995) 54-64 

4. C. Chatfield, Statistics for technology: A course in applied statistics. 
Chapman & Hall, London (1996) 

5. M. Dash, H. Liu, "Feature Selection for Classification", Intelligent Data 
Analysis, Vol 1 (1997) 131-156 

6. C. Cardie, "Using Decision Trees to improve Case-Based Learning", 
Proceedings of the Tenth International Conference on Machine Learning 
(1993) 25-32 

7. K. Nabeshima, T. Suzudo, K. Suzuki & E. Turkcan, "Real-time Nuclear 
Power Plant Monitoring with Neural Networks", Journal of Nuclear Sciencs 
and Technology, Vol 35, No 2 (1998) 93-100 

8. J. R. McDonald, G. M. Burt, J. S. Zielinski & S. D. J. McArthur, Intelligent 
Knowledge Based Systems in Electrical Power Engineering, Chapman & 
Hall, London, 1997. 

9. W. Hamscher, L. Console & J. DeKleer, Readings in Model Based 
Diagnosis, Morgan Kaufmann (1992) 

10. L. Tarassenko, A Guide to Neural Computing Applications , Arnold, London 
(1998) 




Towards an Estimation Aid for Nuclear Power Plant Refuelling Operations 67 



11. S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan 
College Publishing Company, New York (1994) 

12. V. E. McGee & W. T. Carleton, "Piecewise Regression", Journal of the 
American Statistical Association, September (1970) 1109-1124 

13. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University 
Press, Oxford (1995) 

14. I. Watson, Applying Case-based Reasoning : Techniques for Enterprise 
Systems, Morgan Kaufmann (1997) 




Drilling Performance Prediction Using General 
Regression Neural Networks 



V. Karri 

School of Engineering, University of Tasmania 
GPO Box 252-65, Hobart, Tasmania, 7001, Australia 
Vishy . Karri@utas . edu . au 



Abstract. Meehanies of eutting approaeh to drilling performanee predietion is 
based on the three-dimensional oblique eutting theory and simpler orthogonal 
eutting data bank. The quantitative reliability of sueh models depend on 
numerous proeess variables and quantitative aeeuraey of the data bank for a 
given work material. In this paper arehiteeture of General Regression Neural 
Network is proposed, that use proeess variables sueh as tool geometry and 
operating eonditions to estimate thrust and torque in drilling. The developed 
networks are tested over a range of proeess variables to estimate thrust and 
torque. The quantitative aeeuraey of thrust and torque predietions using GRNN 
is found to be superior eompared to the eonventional methods. It is shown in 
this work that using the GRNN arehiteeture the drilling forees are predieted 
within 3% of the experimental values. 



1. Introduction 

Mechanics of cutting approach to thrust and torque prediction involves study of tool 
workpiece interference, kinematic analysis of cutting process. This ‘direct’ modelling 
has been proved to be useful from predictive point of view. However, these models 
were used for individual performance estimation for a set of in put process 
parameters. With advances in on-line control of machine tool, there is a need to 
control more than one performance feature. For example, in drilling operation there is 
a need to estimate thrust, torque, surface roughness and vibrations of the cutting tool 
as performance features for a given set of operating conditions and drill geometry. A 
brief description of unified mechanics of cutting and empirical approaches to drilling 
performance prediction are carried out in this work before the capabilities of neural 
network models are presented. 



2. Unified Mechanics of Cutting Analysis and Empirical Models 
for Drilling Performance 

The thin shear zone (plane) analysis for drilling [1-11] uses elemental technique 
adopted to allow for changes in tool geometry and cutting speed for different points 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 67-73, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




68 V. Karri 



on the lips and chisel edge. The cutting action in the lip region was treated as a 
number of elemental ‘classical’ oblique cutting elements [11-12]. The elemental 
deformation forces were then evaluated from the ‘classical’ oblique cutting 
equations [11,13] given the elemental area of cut and the basic cutting data such as 
shear stress and the chip length ratio. The edge forces [13] were also evaluated to give 
the total force on each element. The forces thus found were used to establish the 
elemental thrust and torque. Summing up the elemental values of thrust and torque, 
the total thrust and torque generated by the lips during drilling were then 
predicted [13]. 

The cutting edge in the chisel edge region was also divided into a number of 
elements. The chisel edge was approximated to a straight line perpendicular to the 
drill axis and the elemental static chisel edge normal rake angles [11,13] were treated 
as constant for all points on the chisel edge and numerically equal to half of the 
wedge angle at the chisel edge at the drill ‘dead centre’. The chisel edge wedge angle 
could be obtained from measurement of the drill, for the unspecified flank shape of a 
general-purpose drill. Due to the high negative rake angles and low cutting velocities 
encountered at the chisel edge, a discontinuous orthogonal cutting model was 
applied [11,12]. 

The elemental chisel edge length, the mean radius, dynamic angles and cut 
thickness at each element for the selected number of elements could be obtained, 
hence the elemental thrust and torque on the chisel edge could be determined by 
summation of the elemental thrust and torque values. The total thrust and torque on 
the drill as a whole were found by summing the corresponding values in the lip and 
chisel edge regions [12,13-16]. The thrust and torque predictions using the mechanics 
of cutting models were ± 15% to the experimental values while machining S1214 free 
machining steel. 



3. Neural Network Architecture for Thrust and Torque 
Predictions in Drilling 

Neural networks are non-logarithmic, non-digital and intensely parallel systems 
consisting of a number of very simple and highly interconnected processors called 
neurodes, which are analogous to the biological neural cells in the brain [17]. 

GRNN is a feedforward network requiring supervised training it is significantly 
different in architecture and algorithm to the BP model. The GRNN was discovered 
by Donald Specht [18] in 1990 and is based on the previously developed Nadaraya- 
Watson kernel regression [19]. GRNNs feature fast training times, can model 
nonlinear functions, and have been shown to perform well in noisy environments 
given enough data [20]. The primary advantage of the GRNN is the speed at which 
the network can be trained. Training a GRNN is performed in one pass of the training 
data through the network, the training data values are copied to become the weight 
vectors between layers. While the advantages of the GRNN include fast training 
times, ability to handle both linear and nonlinear data and the fact that the smoothing 
parameter is the only adjustable parameter, thereby making overtraining less likely, 
the GRNN also has some associated disadvantages. For example, the GRNN requires 
many training samples to adequately span the variation in the data, and it requires that 




Drilling Performance Prediction Using General Regression Neural Networks 69 



all training samples be stored for future use. In addition, the GRNN has trouble with 
irrelevant inputs and there is no intuitive method for selecting the optimal smoothing 
parameter. 

The architecture of a basic GRNN, shown in Figure 1, has four layers; input, 
pattern, summation and output, with weighted connections Wp between the input and 
pattern layer and Ai and Bi between the pattern and summation layer. There are i input 
neurodes,7 pattern neurodes, k+1 summation neurodes and k output neurodes. 




Input Pattern Summation Output 

Layer Layer Layer Layer 



Fig. 1. Basic architecture of GRNN 

Either the squares or the absolute values of the differences are summed and fed into a 
nonlinear activation function [18]. The output from all neurodes in the pattern layer 
then becomes input for all neurodes in the summation layer. For a single output 
network the summation layer consists of a denominator neurode and a numerator 
neurode. For each additional output unit a single numerator is added. Hence, the 
summation layer consists of a single denominator unit and n numerator units, where n 
equals the number of output neurodes. The summation layer neurodes perform a dot 
product between a weight vector and a vector composed of the signals from the 
pattern units [18]. For the denominator summation neurode, the weight vector is 
unity, so a simple sum is performed. For the numerator summation neurode, the 
weight connecting it to each pattern layer neurode is equal to the value of the 
dependent variable for the training case of that pattern layer neurode [21]. 

The output from the denominator and numerator summation neurodes are sent to 
the output layer neurodes, the function of which is to divide the output of the 
associated numerator summation neurode by the output of the denominator 
summation neurode [22-24]. 

In order to reduce the RMS error to a minimum it is necessary to have the correct 
network architecture. One of the most important attributes of layered neural network 
design is choosing the network architecture [22]. The network architecture is a very 
important consideration for optimal trainability and generalisation ability [23]. For 
feedforward network models this decision involves the selection of how many hidden 
layers are necessary and how many neurons are required within each hidden layer and 
within the input and output layers. The decision on the number of neurons in the input 
layer and output layer is straightforward as they are chosen to be the same as the 
dimension of the input and output vectors respectively. However, the decision as to 
how many hidden layers and hidden layer neurons to use is more complex. For 




70 V. Karri 



feedforward networks, it has been proven that there is no theoretical reason to ever 
use more than two hidden layers [24]. Although neural networks were applied to 
certain drilling performance estimation [24], they have not been used for 
simultaneous estimation of both thrust and torque . Choosing an appropriate number 
of hidden layer neurons is also important as using too few will starve the network of 
the resources it needs to solve the problem, while using too many will increase the 
training time and may cause a problem known as overfitting [24]. Overfitting 
describes the process whereby the network learns the training data well but has no 
ability to meet results for test data. 

A guideline for selecting the optimum number of hidden layer neurons is to use as 
few as possible to obtain a satisfactory solution. A common method for determining 
the minimum number of hidden layer neurons required is to compare the root-mean- 
square error of the network for an increasing number of neurons. The minimum 
number of neurons that can be used without increasing the associated network error 
represents the number of neurons that should be used. In addition, the selection of the 
activation function is critical as it also has significant impact on the RMS error and 
computation time of the network. While the logistic function is a common activation 
function with proven results, the decision of which activation function to use is 
guided by selecting an appropriate function that achieves minimum RMS error or 
minimum computation time, depending on the selection criteria. In this case 5 
neurons in each hidden layer were used for the two outputs namely the thrust and 
torque. Therefore for the given input variables, the thrust and torque as multiple 
outputs can be simultaneously estimated by the proposed architecture. 



4. Results and Discussion 

In order to train the network on a comprehensive range of cutting conditions and 
process variables, drilling experiments were carried out. ANCA automatic drilling 
machine was used to carry out the experiments. The thrust and torque were measured 
using three component dynamometer and associated data acquisition system. Taking 
the handbook recommendations and associated feasible drill geometrical features a 
total of 35 experiments were carried out. The training of the network is carried out for 
the 35 cutting conditions above. All the input variables were scaled between 0-1 and 
the training was carried out over 35 combinations of cutting conditions. 

The training was found to be excellent accuracy with a small error at training stage 
indicating that the network is well trained with only 8 inputs and meets the target 
thrust and torque accurately. The error was calculated using the deviation formula 
(Exp.-NN/Exp.)*100 and the percentage deviations at the training stage were well 
within the 5% with an average percentage deviation of 1.2% for the thrust and 0.8% 
for the torque respectively. It can be seen from fig.2a and fig.2b that at the training 
stage for both thrust and torque there is no significant bias either for over prediction 
or under prediction. The neural network architecture was tested over 10 various 
conditions. The GRNN program was run to check the predictability of the neural 
network model for the testing stage. 




Drilling Performance Prediction Using General Regression Neural Networks 7 1 



Figure 3 also shows the testing accuracy of the neural network model for 
predicting both thrust and torque. This accuracy of prediction is particularly pleasing 
since with 8 inputs of desired properties the developed network makes a decision on 
the type of the cutting tool needed. It can be seen that from the 1 0 conditions tested, 
all are well within 5% of the accuracy with an average percentage deviation of less 
than 2%, hence confirming excellent predictive ability of the model. 





Ihrust Riedction Usirig MSI 




Torque Redction Using M^J 


^40- 
= 2D- 

O" ^ 

CD n 


1 1 1 1 . 1 ■ 1 




> 

o 

§■ 0 1 * 1 — ^ ^ ^ 1 


S: 

U- 


— — ^ ^ ^ ^ ^ 1 

1.01 5.30 -3.37 Mxe -1212 -7.75 




Fi 

2. IS 
-2.41 
More 
■11.6' 
6.76 
-7.01 




F^roertage D9\^cn 




Percentage Deviation 



Fig.2a - Thrust Prediction at Training Fig.2b - Torque Prediction at Training 





Fig.3a - Thrust Prediction at Testing Fig.3b - Torque Prediction at Testing 



5. Conclusion 

The need for reliable and simultaneous prediction of thrust and torque in drilling 
operations is highlighted. Thes fundamental mechanics of cutting approach to thrust 
and torque prediction in drilling operation has been found to be complex with 
numerous process variables involved together with orthogonal cutting data bank and 
edge forces for the drilling performance prediction. The predictive accuracy of this 
traditional approach was found to be dependent on the reliable orthogonal cutting data 
bank, the accuracy of the edge forces. GRNN architecture is proposed for 
simultaneous prediction of thrust and torque. The GRNN is trained over 35 conditions 
and tested on 10 different cutting conditions. This showed excellent predictive 
capability with a moderately smaller range of training conditions. This is encouraging 
since to develop reliable cutting models by mechanics of cutting approach, a 
comprehensive range of cutting conditions have to be carried out and numerous 
process variables than the GRNN architecture. 



72 V. Karri 



References 

1. SOCIETY OF MANUFACTURING ENGINEERS, Tool and Manufacturing 
Engineers Handbook, Edition, McGraw-Hill, New York, 1976. 

2. ARMAREGO, E. J. A. and BROWN, R. H., The Machining of Metals, Prentice- 
Hall, 1969. 

3. AMERICAN STANDARD, U.S.A.S, B94.1 1-1967. 

4. Chinese National Standards of Measuring and Cutting Tools, Chinese Standard 
Publishing House, 1990. 

5. GALLOWAY, D. F., ‘Some Experiments on the Influence of Various Factors on 
Drill Performance’, Trans. A.S.M.E.., Vol. 79, 1957, p 191. 

6. WRIGHT, J. D., ‘A Study of the Geometrical Variability of Manufactured Twist 
Drills’, M. Eng. Sc. Thesis, University of Melbourne, 1975. 

7. MICHELETTI, G. F. and LEVI, R., ‘The Effect of Several Parameters on Twist 
Drill Performance’, Proc of 8^^ Int. M.T.D.R. Conf , University of Manchester, 
Sept., 1967. 

8. PRAMANIK, D. K., ‘Variables Affecting Drill Performance’, PhD Thesis, 
University of Melbourne, 1988. 

9. GALLOWAY, D.F. and MORTON, I. S., ‘Practical Drilling Tests’, Research 
Dept. The Inst, of Prod. Eng (U.K.), 1946. 

10. ARMAREGO, E. J. A. and WHITFIELD, R. C., ‘Computer Based Modelling of 
Popular Machining Operations for Force Prediction’, Annals CIRP, Vol. 34, 
1985, p 65. 

11. WHITFIELD, R. C., ‘Force Prediction in Machining’, PhD Thesis, University of 
Melbourne, 1986. 

12. KARRI, V., ‘Fundamental Studies of Rotary Tool Cutting Process’, PhD Thesis, 
University of Melbourne, 1991. 

13. H, ZHAO., “Predictive Models for Forces, Power and Hole Oversize in Drilling 
Operations”, PhD Thesis, The University of Melbourne, 1994. 

14. BOSTON O. W., and GILBERT, W. W., ‘The Torque and Thrust of Small Drills 
Operating in Various Metals’, Trans. A.S.M.E., Vol. 58, N2, 1936, p 79. 

15. PAL, A. K., BHATTACHARYYA, A. and SEN, G. C., ‘Investigation of the 
Torque in drilling ductile materials, Int. J. Mach. Tool Des. Res., Vol. 4, 1965, p 
205. 

16. WILLIAMS, R. A., ‘A Study of the Mechanics of the Drilling Process’, Harold 
Armstrong Conf on Prod. Sci. Inst, of Engrs., Australia, Melbourne, 1971. 

17. CAUDILL, M. and BUTLER, C., “Understanding Neural Networks - Computer 
Explorations”, vol. 1: Basic Networks, Massachusetts Institute of Technology, 
1992. 

18. SPECHT, D. F., “General Regression Neural Networks”, Institute of Electrical 
and Electronic Engineers Transactions on Neural Networks, vol. 2, no. 6, Nov. 
1991, pp. 568-576. 

19. SARLE, W., “FAQ for comp. ai. neural-net. What is a GRNN?”, part 2, 
ftp://ftp.sas.com/pub/neural/FAQ.html, 1997. 

20. SHAFFER, R., “General Regression Neural Networks”, http://cheml.nrl.navy/ 
-shatter/grnn.html, 17^^ Nov. 1998. 




Drilling Performance Prediction Using General Regression Neural Networks 73 



21. ZURADA, J. M., “Introduction to Artificial Neural Systems”, West Publishing 
Company, 1992. 

22. YU. X., LOH, N. K., JULLIEN, G. A. and MILLER, W. C., “Comparisons of 
Lour Learning Algorithms for Training the Multi-Layer feed forward Neural 
Networks with Hard Limiting Neurons”, Neural Networks Theory, IEEE, New 
York, 1996. 

23. MASTERS, T., “Practical Neural Network Recipes in C++”, Academic Press 
Inc., 1993. 

24. LUI, T. I., ANANTHARMAN, K. S., Tntelligent Classification and 
Measurement of Drill Wear, Journal of Engineering for Industry, Transactions of 
ASME, vol. 116, 1994, pp 392-397. 




Identifying Significant Parameters for Hall-Heroult 
Process Using General Regression Neural Networks 



F. Frost' and V. Karri^ 

* Comalco Aluminium Limited, PO Box 290, George Town, Tasmania, 7253, Australia 
Fred . Frost@comalco . riot into . com.au 
^ School of Engineering, University of Tasmania, Hobart, Tasmania, 7001, Australia 
Vishy . Karri@utas . edu . au 



Abstract. While there are many models of the neural networks that are suitable 
for a partieular applieation, eaeh model will yield different aeeuraey when 
applied and moreover, the features of the training data that are used by the 
neural networks will differ in eaeh instanee. When a neural network is initially 
trained for a speeifie applieation, some of the features of the training data are 
not signifieant to the network’s deeisions, while other features are eritieal. 
Further, there is a eost and diffieulty of measurement assoeiated with the 
eolleetion of proeess parameter data to be used as inputs in the neural network 
model. Henee, from industry point of view, it is benefieial to inelude only 
value-adding parameters in the neural network model, to avoid expenditure in 
eolleeting irrelevant proeess data that ean be omitted without eompromising the 
model aeeuraey. In this paper, a teehnique is used to identify the signifieant 
proeess parameters that are required for a partieular applieation. Using a 
praetieal applieation in smelting industry, non-eontributing variables are 
removed from the neural network model to aehieve an improvement in 
predietion aeeuraey. 



1. Introduction 

The Hall-Heroult process [1] was discovered in 1889 and remains to date the only 
known viable method for the production of aluminium. In this electrolysis process [2] 
pure alumina is dissolved in an electrolyte of molten cryolite, in large electrolytic 
furnaces, called reduction cells [3]. The major components of the reduction cell are 
shown in Figure 1 to aid an explanation of the Hall-Heroult process. By means of a 
carbon anode suspended in the electrolyte, electric current is passed through the 
electrolyte mixture causing metallic aluminium to be deposited on the carbon cathode 
at the bottom of the cell. The heat generated by passage of this electric current keeps 
the electrolyte molten, so that alumina can be added as necessary to make the process 
continuous. Molten aluminium is removed periodically from the reduction cell, via a 
vacuum syphon and large crucible, as shown in Figure 1. This molten aluminium is 
then solidified into ingot and billet forms and shipped to customers for value-added 
manufacturing into such products as aluminium foil, aluminium drink containers and 
aluminium components for ship and aircraft construction. Aluminium is a useful 
material for a broad range of applications due to the unique combinations of 
properties provided by aluminium, making it one of the most versatile, economical 
and attractive metallic materials available [4]. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 73-78, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




74 F. Frost and V. Karri 




Alumina supply 
hopper 



Carbon lining 
cathode 



Cryolite bath Molten aluminium 



Fig. 1. Basic Components of the Reduetion Cell Highlighting Removal of Aluminium [5] 

While molten cryolite is the main ingredient of the Hall-Heroult process, 
electrolyte additives are used to lower the liquidus temperature of the electrolyte and 
hence permit operation of the cell at temperatures below the melting point of pure 
cryolite [6]. The reduction in operating temperature is beneficial in that savings in 
power consumption and improvements in current efficiency can be realised [7]. In 
order to maximise process efficiency it is critical that the correct quantity of 
electrolyte additives be scheduled for the reduction cell. In order to achieve this, 
process modelling is largely beneficial. In particular, neural networks [8-11] are 
useful to develop a process model and subsequently use the developed model to 
determine the correct quantity of electrolyte additives to schedule to each reduction 
cell. To satisfy this objective the general regression neural network, GRNN, is applied 
in this instance. The GRNN was discovered by Donald Specht [12] in 1990 and is 
based on the previously developed Nadaraya- Watson kernel regression [13]. GRNN’s 
feature fast training times, can model nonlinear functions, and have been shown to 
perform well in noisy environments given enough data [14]. The primary advantage 
of the GRNN is the speed at which the network can be trained. Training a GRNN is 
performed in one pass of the training data through the network, the training data 
values are copied to become the weight vectors between the network layers [15-17]. 
The architecture of a basic GRNN, shown in Figure 2, has four layers; input, pattern, 
summation and output, with weighted connections Wji between the input and pattern 
layer and Aj and between the pattern and summation layer. There are i input 
neurodes, j pattern neurodes, k+1 summation neurodes and k output neurodes. The 
function of the input layer is to pass forward the activity patterns presented to the 
network to all neurodes in the pattern layer. The neurodes in the pattern layer perform 
a nonlinear transformation of the input patterns. When a new vector X is entered into 
the network, it is subtracted from the stored weight vector representing each activity 
pattern. Either the squares or the absolute values of the differences are summed and 
fed into a nonlinear activation function [16]. The output from all neurodes in the 





Identifying Significant Parameters for Hall-Heroult Process 75 



pattern layer then becomes input for all neurodes in the summation layer. For a single 
output network the summation layer consists of a denominator neurode and a 
numerator neurode. For each additional output unit a single numerator is added. 
Hence, the summation layer consists of a single denominator unit and n numerator 
units, where n equals the number of output neurodes. The summation layer neurodes 
perform a dot product between a weight vector and a vector composed of the signals 
from the pattern units [16]. For the denominator summation neurode, the weight 
vector is unity, so a simple sum is performed. For the numerator summation neurode, 
the weight connecting it to each pattern layer neurode is equal to the value of the 
dependent variable for the training case of that pattern layer neurode [17]. The output 
from the denominator and numerator summation neurodes are sent to the output layer 
neurodes, the function of which is to divide the output of the associated numerator 
summation neurode by the output of the denominator summation neurode. 

Numerator 




Input Pattern Summation Output 

Layer Layer Layer Layer 



Fig. 2. Basic Architecture of the General Regression Neural Network 

To further the understanding of process behaviour in the aluminium smelting 
industry it is important to determine the features of the training data that are used by 
the neural network during process modelling. When a neural network is initially 
trained for a specific application, some of the features of the training data are not 
significant to the network’s decisions, while other features are critical. Hence, it 
becomes necessary to have some measure of importance to distinguish contributing 
and non-contributing features of the training data. One of the most common measures 
of input variable importance is predictive importance, which is concerned with the 
increase in generalisation error when an input variable is omitted from the training 
and test data sets [18]. An analysis using predictive importance is completed by 
initially training the network using all inputs, i. The network is then retrained with a 
single input omitted from the model to study the change in the network error, 
hence, i 1 inputs are used. This is completed i times with a different input omitted in 
each instance. The resulting change in error in each instance is a direct measure of 
predictive importance. An increase in error indicates the omitted input is adding value 
to the network decision, while a decrease in error, or no change, indicates the omitted 
input was not contributing to the network prediction. As a measure of error, the root- 
mean-squared, RMS, error is an adequate and commonly used error measure in neural 
network modelling and hence, is used here [18]. 




76 F. Frost and V. Karri 



2. Development of Neural Network Training and Test Data for 
Electrolyte Additive Prediction Application 

There are twelve process parameters selected as potential neural network inputs for 
this electrolyte additive prediction application, while the quantity of two particular 
electrolyte additives to add to the reduction cell form the outputs of this model, as 
shown in Figure 3. 

Network Inputs Arbitrary Neural Network Architecture Network Outputs 

Cell stability 
Cell efficiency 
Anode displacement 
Current flow 
Potential change 
Sludge level 
Volume indicator 
Amperage 
Surface roughness 
Surface displacement 
Sludge density 
Cell identification 




Fig-3. Illustration of Process Parameters Used for Neural Network Modelling for Eleetrolyte 

Additive Predietion Applieation 

Data acquisition from the smelter knowledge base with subsequent data formatting 
and pre-processing produced 1,565 data patterns in total for this industrial application. 
Data preparation formed a complex stage of network modelling, including the 
elimination of corrupt information from the acquired data, ensuring sufficient data for 
each model variable was included in the data patterns, covering the entire operating 
range for each parameter, and scaling each variable between the minimum and 
maximum limits of its operating range. The processed data was then divided into 
1,365 training data patterns, with the remaining 200 patterns used as test data. The 
train and test data sets were carefully selected to ensure the network output variables 
were well represented in each data set, covering the entire range of values for each 
parameter. 

3. Neural Network Modelling Results for Electrolyte Additive 
Prediction Application 

In order to develop an accurate model of the Hall-Heroult process using the GRNN it 
is necessary to study RMS error behaviour with changing network architecture. In 
particular, the adjustable parameters of the GRNN that influence prediction accuracy 
are the number of pattern layer nodes used and the value of the receptive field 
width, a. The improvement on the RMS error obtained with the GRNN model is 
achieved as a result of the predictive importance analysis. Omitting each input 




Identifying Significant Parameters for Hall-Heroult Process 77 



variable individually from the neural network model and observing the resulting RMS 
error highlights the value that the omitted input was contributing. This analysis is 
completed using the optimum network conditions determined in the previous phase, 
in particular, 600 patterns layer nodes and a o value of 0.1. Hence, the potential 
twelve neural network inputs only cell stability, cell efficiency, current flow, sludge 
level and cell identification are contributing to the network prediction. Therefore, the 
remaining input variables are removed from the model and the GRNN retrained. This 
results in an improved RMS error of 0.0548 and 0.0671 for the train and test data sets, 
respectively.The results of the predictive importance analysis are useful to determine 
the percentage contribution of the input variables. This is calculated by summing the 
values of the error increase when an input variable is omitted from the train and test 
data sets and dividing the error change for each variable by the error sum. This yields 
a percentage contribution of each input variable towards the total error. The 
percentage contribution of the input variables, based on the results of the predictive 
importance technique, is shown in Figure 4. 



Cell identification 
10.9% 




19 4% Cell efficiency 

10.5% 

Fig. 4. Percentage Contribution of Input Variables for GRNN Network Based on Predictive 

Importance Analysis 

Hence, it is shown that cell stability is the most significant input, followed by 
sludge level, current flow, cell identification and cell efficiency being the least 
significant of the contributing input variables. This ranking of importance of the input 
parameters highlights that the network prediction is influenced mostly by cell 
stability. In addition, sludge level and current flow have a similar influence, as do cell 
identification and cell efficiency. 



4. Conclusions 

It has been shown that identifying the significant process parameters that contribute to 
the prediction accuracy of neural network models has two major advantages. Firstly, 
removing non-contributing inputs from the model reduces the complexity of the 
neural network model. Secondly, the cost and difficulty associated with obtaining 
values for process parameters used in the neural network model is reduced by using 
only those inputs that contribute to improving the model accuracy. It has been shown 



78 F. Frost and V. Karri 



in this work that neural networks are of significant benefit in the aluminium smelting 
industry to develop process models. Identifying significant parameters is particularly 
useful as it eliminates the need to spend significant time studying the complex 
chemistry and physics associated with the Hall-Heroult process. 



References 

1. GRJOTHEIM, K. and KVANDE, H., “Understanding the Hall-Heroult Process 
for Production of Aluminium”, Aluminium- Verlag, Dusseldorf, 1986. 

2. HAUPIN, W. E., “Principles of Aluminium Electrolysis”, Proc. 124th TMS 
Annual Meeting, Las Vegas, Feb. 12-16, 1995, pp. 195-203. 

3. KENIRY, J., “Outline of the Reduction Process”, Proc. Aluminium Smelting 
Fundamentals, Course 1, Comalco Aluminium Limited, 1994. 

4. BOYER, H. E and HALL, T. L., “Metals Handbook”, Desk Addition, American 
Society for Metals, 1985. 

5. TOMAGO ALUMINIUM, “The Aluminium Production Process”, http://www. 
tomago.com.au/aluminium.html, Feb. 24, 1999. 

6. GRJOTHEIM, K. and WELCH, B. J., “Aluminium Smelter Technology”, 
AVerlag, 1988. 

7. MATHEOU, N., “Electrolyte Control in Aluminium Cell”, Proc. Al. Fund., 1994. 

8. CAUDILL, M. and BUTLER, C., “Naturally Intelligent Systems”, MIT, 1990. 

9. CAUDILL, M. and BUTLER, C., “Understanding Neural Networks - Computer 
Explorations”, vol. 1, Massachusetts Institute of Technology, 1992. 

10. HERTZ, J., KROGH, A. and PALMER, R. G., “Introduction to the Theory of 
Neural Computing”, Addison- Wesley Publishing Company, 1991. 

11. ZURADA, J. M., “Introduction to Artificial Neural Systems”, West Publishing, 
1992. 

12. SPECHT, D. F., “General Regression Neural Networks”, Institute of Electrical 
and Electronic Engineers Transactions on Neural Networks, vol. 2, no. 6, Nov. 
1991, pp. 568. 

13. MASTERS, T., “Advanced Algorithms for Neural Networks: A C++ 
Sourcebook”, John Wiley and Sons, 1995. 

14. SHAFFER, R., “General Regression Neural Networks”, http://cheml.nrl.navy/ 
-shatter/grnn.html, 1998. 

15. SARLE, W., “FAQ for comp. ai. neural-net. What is a GRNN?”, part 2, 
ftp://ftp.sas.com/pub/neural/FAQ.html, 1997. 

16. YU. X., LOH, N. K., JULLIEN, G. A. and MILLER, W. C., “Comparisons of 
Four Learning Algorithms for Training the Multi-Layer Feed Forward Neural 
Networks with Hard Limiting Neurons”, Neural Networks Theory, IEEE, New 
York, 1996. 

17. MASTERS, T., “Practical Neural Network Recipes in C++”, Academic Press 
Inc., 1993. 

18. FROST, F. and KARRI, V., “Determining the Influence of Input Parameters on 
BP Neural Network Output Error Using Sensitivity Analysis”, Proc. ICCIMA, 
New Delhi, INDIA, Sep. 1999. 




Mapping Object-Oriented Systems to Distributed 
Systems Using Data Mining Techniques 



Miguel A. Serrano', Doris L. Carver' , and Carlos Montes de Oca^ 

* Department of Computer Science, Louisiana State University 
Baton Rouge, Louisiana, 70803. USA 
(masv, carver) @bit . esc . Isu . edu 
^ Centro de Investigaeion en Matematieas 
Apdo. Postal 402, Guanajuato, Gto, 36000, Mexieo 
moca@cimat . mx 



Abstract. We present a reengineering approaeh for deeomposing existing 
objeet- oriented systems into subsystems that have low eoupling and are suitable 
for distribution. We use reverse engineering teehniques for the arehiteetural and 
design reeovery. We use objeet-oriented metries teehniques for the assessment 
of relationships and interaetions between objeet-oriented eonstruets sueh as 
elasses, objeets, and methods. Next, we use data mining teehniques to diseover 
assoeiations in the underlying system and elustering teehniques to ereate a 
hierarehieal grouping of subsystems that is eonvenient for guiding the 
alloeation of the subsystems to a hierarehieal network. Finally, we effieiently 
alloeate subsystems to different sites by mapping the hierarehieal 
deeomposition of subsystems to a hierarehieal network representation. For the 
implementation, we use middleware teehnologies. 



1 Introduction 

In the last decade, there has been increasing use of the object-oriented paradigm and 
increasing interest in client-server and distributed systems technologies. Many 
approaches for the development of distributed object systems that target efficient 
distribution of objects, such as [6] and [2], start from the specifications or logical 
design of the software system; however, there are numerous legacy systems for 
which there are no high-level specification documents. There is a need for tools and 
methodologies that assist with the migration of these applications to new platforms 
and distributed environments. 

In this paper, we present a reverse engineering, metrics, and data mining driven 
approach for decomposing existing object-oriented systems into low-coupled 
subsystems that are suitable for distribution. Section 2 includes the related work. 
Section 3 explains the subsystem decomposition approach, and Section 4 presents the 
summary. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 79-84, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




80 Miguel A. Serrano et al. 



2 Related Work 

There are several approaches to migrate legacy systems to newer programming 
paradigms and technologies. One approach is the total redevelopment of the legacy 
system starting with new specifications. Another approach is to first apply reverse 
engineering techniques to existing code to recover the design model and then to 
develop a new system by possibly reusing some parts of the legacy system [4] . 

Sneed and Majnar [8] report experiences on migrating legacy systems to object 
technologies and client/server systems. They use wrapping at different levels of 
encapsulation such as job, transaction, program, module, and procedure. De Lucia [4] 
presents an approach to migrate legacy systems to object-oriented platforms. Reverse- 
engineering techniques are used for object identification. The identified objects are 
encapsulated with wrappers and are then incrementally translated as "legacy-objects" 
to object-oriented platforms. 

Purao proposes an approach for systematically deriving distributable units from an 
object-oriented system and effectively distributing them to processors. The logical 
specifications of the object-oriented system, usage patterns, semantic associations, 
and usage location help to derive an efficient decomposition and allocation of the 
system [6]. Bastarrica proposes an architectural specification to serve as the basis for 
obtaining optimal distribution of object-oriented application components over a target 
network that minimizes remote communication between components. The Binary 
Integer Programming (BIP) model with constraints such as storage and 
communication provides optimal distribution [2] . 

Briand provides a unified framework for coupling and cohesion metrics in object- 
oriented systems by reviewing existing frameworks and by providing a standardized 
terminology and formalism for expressing new and existing measures in a fully 
consistent and operational manner [3]. 



3 The Mapping Approach 

In this section, we describe our approach for the mapping of object-oriented systems 

to distributed systems. It consists of the following steps: 

1. Apply reverse engineering and design recovery techniques to obtain the software 
architecture of the underlying object-oriented code. We use the LLSA [14] as the 
architectural representation model. 

2. Generate sets containing object-oriented constructs such as classes, objects, and 
methods linked by some relationship such as coupling. Object-oriented metrics 
techniques are used in this step. 

3. Create a hierarchical subsystem decomposition of the system, using data mining 
and clustering techniques. 

4. Allocate subsystems to processing units, and implement the distributed system 
using middleware technologies such as CORBA and IDL. 

We describe each of the steps in sections 3.1 - 3.4. 




Mapping Object-Oriented Systems to Distributed Systems 8 1 



3.1 Software Architecture and Design Recovery of the System 



To get the software architectural model, the code of the original system must be 
parsed and techniques for the analysis of the code must be applied. We use the Low 
Level Software Architecture Model (LLSA) developed by Shrivastava [7] to represent 
the target system. 

The LLSA model captures physical and logical dependencies between object- 
oriented software programming constructs and design abstract concepts (e.g., classes, 
objects, and methods). We call these programming constructs entities. In LLSA, each 
entity represents logical concepts in the design of the system. Each entity has an 
interface that describes its static and dynamic behavior. Interactions connect two 
entities and define an existing relationship in the underlying paradigm (e.g. a function 
call is the interaction of two functions). Interactions are classified as static and 
dynamic. 

Fig. 1 shows an example of the LLSA. In this figure, the sample code appears on 
the left and the LLSA text representation for classes entities appear on the upper right. 
For a comprehensive description of the LLSA model, refer to [7]. 



3.2 Creation of Relationships Sets among Entities 



The next step is the decomposition of an object-oriented system into subsystems that 
are suitable for distribution. We consider the class as the unit of distribution. We are 
especially interested in relationships present in the system that produce 
communication and dependence between classes. We use object-oriented coupling 
metrics to discover these relationships. Coupling measures the strength of the 
association among modules/classes. We use the CBO and DAC object-oriented 
metrics. CBO and DAC as defined in [3] are shown in Fig. 2. 

We use coupling metrics to generate metric sets. Metric sets consist of pairs of 
coupled object-oriented entities. In Fig. 2, the object-oriented metrics (CBO and 
DAC) are defined as the size of the respective metric sets (CBOSet and DACSet). In 
this step, we are not interested in the size of the metric set, but rather in the elements 
of the metric set. The metric sets give us information about the interaction between 
one class and all the other classes in the system. The importance of using object- 
oriented metrics as opposed to only using information from the design recovery is that 
object-oriented metrics take into account features such as inheritance, polymorphism, 
and transitive relationships. Finally, we produce interaction matrices from the metric 
sets. We generate matrices of interacting classes. For interacting classes, the rows and 
the columns contain the classes in the system. The intersection of row i and column j 
have the value of 1 if class i and class j belong to the metric sets (i.e., if the two 
classes have some interaction). The intersection has the value of 0 otherwise. The 
lower right part of Fig. 1 shows an example of the CBO interaction matrix. 




82 Miguel A. Serrano et al. 



3.3 Subsystem Decomposition 

We use techniques from ISA [5] to obtain the subsystem decomposition of the object- 
oriented system. ISA combines data mining and clustering techniques to obtain a 
hierarchical subsystem decomposition of a system. ISA identifies subsystems in three 
steps: 

1. Build a database view of the system. Based on the interaction matrices and metric 
sets, we create a view of the system. 

2. Perform data mining. Use a data mining algorithm to mine association rules over 
the data base view of the system. 

3. Consolidate and interpret results. Apply clustering techniques to the outcome of the 
mining process to produce a hierarchical subsystem decomposition of the system. 
Agrawal, Imielinski, and Swami [1] introduced the problem of mining association 

rules from large databases of transactions. Formally, the problem of mining 
association rules is defined in [1] as follows: Let I={ii, I 2 , is, im} be a set of items. 
D is a set of transactions R such that R e I. Additionally, R contains X if X e R. An 
association rule is an implication X ^ Y, where X e I, Y (Z I, and X n Y = 0. The 
rule X ^ Y holds in D with confidence c if c% of transactions in D that contain X 
also contain Y. In addition, the rule X ^ Y has support ^ if s% of the transactions 
in D contain X u Y. 



class Cl { 

C2 c2olincl; 

public void clmlOl 
C2 c2olinclml; 



c2olinclml.c2ml0; 

public static void cliriZQ { .. 
public void clinSOI 
c2olinc 1ml .c/mlQ; 



} 



aassC2{ 

public c2ml0{ 
cl.clin20; 

} ^ 



Qass C3 extends Cl { 
public c3ml( C2 

c2olincjmlpar.c2ml(); 



c2olinc3mlpar){ 



public c3m20{ 
c2 c2olu 



imc3m2; 



c3mltc 

c3ml(c2olincl); 



Qass C4 extends C2{ 

Cl clolinc4; 

public c4mlQ{ 

C3 c3olinc4ml; 

^ c3olinc4ml.c3ml0; 



public c4m2Q{ 
clolinc4.clml0; 
c2ml0; 



Class :C1 




Class :C2 


Static Interface: 




Static Interface: 


Cl() 




C2() 


clmlQ 




c2ml() 


clm2() 




Static Interactions 


clm3() 




Descendant Classes 


Static Interactions 




Class C4 


Descendant Classes 








Class :C3 
Static Interface: 


Class C3 




C3() 

c3ml() 


Class :C4 




Static Interface: 




c3m2() 


C4() 




c3m3() 


c4ml() 




Dynamic Interface: 


c4m2() 




C1::C1() 


Dynamic Interface: 




Cl::clml() 


C2::C2() 




Cl::clm2() 


C2::c2ml() 




Cl::clm3() 


Static Interactions 




Static Interactions 


Ancestor Classes 




Ancestor Classes 


Class C2 




Class Cl 




Cl 
C2 
C3 

M I X 



Cl C2 C3 C4 



CBO Interaction Matrix 



Fig. 1. Example of LLSA and interaction matrix 

The information from the previous step (i.e., metric sets and interaction matrices) 
is the basis for the construction of the database view of the system used by the mining 
algorithm. In that context, each row in the interaction matrix is a transaction. 





Mapping Object-Oriented Systems to Distributed Systems 83 



Association rules help to identify a set of related items in a large set of transactions. In 
the context of metric sets, an association rule can be used to identify sets of related 
classes (e.g., c% of classes that use class X also use class Y). 



-Let c G C, d & C where C is the set of all classes 

CBO{cy. (coupling between objects) Count of the number of classes to which class c is coupled. A class is coupled to 
another if methods in one class use methods or attributes of the other 

CBO{c)= \ CBOSet{c)\ 

CBOSet{c): {d& C- {c} |uses(c, d) vuses(c/, c) } 

uses{c, d): A class c uses a class c/ if a method implemented in class c references a method or attribute implemented in 
class d. It can be defined for dynamic method invocations and for static method invocations 

DAC(c): (data abstraction coupling) Number of noninherited attributes in class c having a class as their type 
DAC(c)= \DACSet(c) \ 

DACSet{c) = {a I a G Aj(c) a T(a) e C } 

Where Aj(c) are the attributes of class c and T(a) is the type of the attribute 



Fig. 2. CBO and DAC metric definitions 

The mined associations are used to form subsets of entities (i.e., subsystems) 
organized in a hierarchy. This process is done by the junta algorithm [5]. This 
algorithm has two stages. First, the mined associations are used to guide a clustering 
process that forms groups of entities. This process is controlled by similarity functions 
that are based on the confidence of the mined associations. Second, the associations 
guide an iterative merging process in which the groups produced in the first stage are 
joined to form larger groups (i.e., subsystems). In each iteration, larger groups are 
formed until the largest groups reach a dissimilarity threshold. The result of this 
process is a series of abstract trees (i.e., a forest) representing hierarchies of groups. In 
other words, the outcome of this step is a hierarchical subsystem decomposition. Each 
subsystem consists of a group of entities that have high coupling and high data 
abstraction coupling. 

3.4 Allocation of Subsystems 

The objective in this step is to allocate the subsystems to one or more sites or 
processing elements. During allocation, one or more subsystems may be allocated to 
the same site. We allocate components with maximal locality of data and processing. 
The hierarchical subsystem decomposition of the system is suitable for allocation of 
components to sites. We traverse the subsystem hierarchy. We allocate to the same 
site subsystems that share the same parent in the hierarchical decomposition until a 
stop condition, such as the maximum capacity of the site, is reached. 

We use the following heuristics to allocate components to a hierarchical network. 

□ Build the subsystem hierarchy tree. 

□ Build a hierarchy network graph of the underlying interconnection network and 
processing elements. Networking capabilities are represented as layers that have to 
be traversed. The layers represent the structure of the interconnection network with 
costs associated for crossing layers. By identifying the layers at which the 
processing elements reside, it is possible to estimate the cost associated with the 
information flow between two processors [6]. We create a hierarchy network graph 





84 Miguel A. Serrano et al. 



with processing elements as nodes and the cost associated with the information 
flow between two processing elements as edges. 

□ Map the component hierarchy tree to the hierarchy network graph with the 
objective of minimizing the cost of the information flow between processing 
elements. We use binary integer programming algorithms to solve this problem [2]. 
After the allocation, we implement the system as a distributed system using a 
middleware technology (such as CORE A) that allows the interconnection/interfacing 
of subsystems. CORBA is a set of specifications for providing interoperability and 
portability to distributed object-oriented applications. 

4 Summary 

We have applied the approach presented in this paper to medium size programs 
in C++ and Java. We used the CBO and DAC metric sets to drive the data mining and 
clustering process. We allocated the subsystems to different computers (Windows NT 
connected in a LAN) and used IDL and CORBA to implement the communication 
among the subsystems. The main benefits of the new system are the distribution of 
computation and the potential to integrate the system with other applications and 
components through CORBA. We are incorporating other metric sets and applying 
them to larger legacy systems. 

References 

1. Agrawal, R., Imielinski, T., Swami, A., "Mining Association Rules Between 
Sets of Items in Large Databases", Proc. ACM SIGMOD Infl Conf Mgmt. of 
Data, 1993, pp. 207-216. 

2. Bastarrica, M., Demurjian, S., Shvartsman, A., "Software Architectural 
Specification for Optimal Object Distribution", XVIII International Conference 
of the Chilean Computer Science Society, November 12-14, Antofagasta, Chile, 
1998. Available at: 

http://dlib.computer.org/conferen/sccc/8616/pdf/86160025.pdf 

3. Briand, L., Daly, J., Wust, J, "A Unified Framework for Coupling 
Measurements in Object-Oriented Systems", IEEE Trans, on Software 
Engineering, Vol. 25, No. 1, 1999, pp. 91-121. 

4. De Lucia, A., Di Lucca, G., Fasolino, A., Guerra, P., Petruzzelli, S., "Migrating 
Legacy Systems Towards Object-Oriented Platforms", Int. Conf in Software 
Maintenance, IEEE Computer Society Press, 1997, pp. 122-129. 

5. Montes De Oca, C., Carver, D., “Identification of Data Cohesive Subsystems 
Using Data Mining Techniques”, Int. Conf On Software Maintenance, IEEE 
Computer Society Press, 1998, pp. 16-23.. 

6. Purao, S., Jain, H., Nazareth, D., "Effective Distribution of Object Oriented 
Applications", Communications of the ACM, Vol. 41, No. 8, 1998, pp. 100-108. 

7. Shrivastava, C., Carver, D., “Using Low-Level Software Architecture for Software 
Maintenance of Object-Oriented Systems”, Proc. 1995 Software Engineering 
Fomm, 1995, pp. 31-40. 

8. Sneed, H., "Encapsulating Legacy Software for Use in Client-Server Systems", 
Working Conference in Reverse Engineering, IEEE Computer Society Press, 
1996, pp. 104-119. 




Scaling the Data Mining Step in Knowledge Discovery 
Using Oceanographic Data 



Bruce Wooley, Susan Bridges, Julia Hodges, and Anthony Skjellum 

Department of Computer Science, Mississippi State University 
{bwooley , bridges , hedges , tony }@cs . msstate . edu 



Abstract. Knowledge discovery from large acoustic images is a 
computationally intensive task. The data-mining step in the knowledge 
discovery process that involves unsupervised learning (clustering) 
consumes the bulk of the computation. We have developed a technique 
that allows us to partition the data, distribute it to different processors 
for training, and train a single system to join the results of the 
independent categorizers. We report preliminary results using this 
approach for knowledge discovery with large acoustic images having 
more than 10,000 training instances. 



1 Introduction 

Knowledge discovery is an iterative process consisting of several steps including 
selection, preprocessing, transformation, data mining, and interpretation and 
evaluation [8]. We have worked collaboratively with the scientists at the Naval 
Oceanographic Office (NAVOCEANO) at the S tennis Space Center to develop a 
knowledge discovery process for extracting provinces of similar visual texture from a 
database of very large acoustic images using the unsupervised learning algorithm 
AutoClass [7]. This process is described in detail in Hodges et al. [9] and Bridges et 
al. [3] and is illustrated in figure 1. We first extract gray-level co-occurrence matrices 
using quantized (0..15) pixel values from regions in the image data. We then compute 
textural statistics on these gray level co-occurrence matrices to obtain a vector of 
features for each region [3, 10, 15]. We performed the data-mining step on this set of 
feature vectors using AutoClass to cluster the instances. Finally, we mapped the 
categorized regions back onto a copy of the original image by assigning a specific 
color or gray scale to each class and coloring the pixels in each region with the color 
associated with that region's class. 

The clustering step within our data mining process required significantly more 
computational time than all the other steps in the process and requires all the training 
data to be resident in RAM during the training process. Our current image data 
consists of 16 images requiring approximately 220 Megabytes of disk space, but we 
expect to be working with over 300 image files requiring over 1.5 Gigabytes of 
storage in the near future. Accommodating massive amounts of data requires some 
scaling technique to be applied to the data-mining algorithm. The excessive time 
required to process the learning phase suggest the use of multiple processors. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 85-92, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




86 Bruce Wooley et al. 



One approach to scaling the data mining process is to embed the scaling techniques 
directly in the learning algorithm. Two research groups have recently reported 
algorithms that accomplish this by performing incremental learning [2, 11]. An 
alternative approach is to partition the data into subsets, distribute the subsets to 
different processors, apply a sequential learning algorithm on each subset, and 
combine the learned results into a single classifier [4]. 





Region 
extractio n 



Feature 
extractio n 



(Inst- 1 ai b 1 c i) 
(Inst-2 ai bi Ci) 



(Inst-n a„ b„ Cn) 



A c o u Stic 
Im age 



C lassificatio n 



Feature vector 
repre sentatio n o f 
instances 



Class 1 
Inst- 1 
Inst-45 
Inst-77 
C lass 2 
In St- 1 4 
Inst- 10 5 
Inst-300 



C lass m 



Texture 

classes 



V isu alizatio n 




C lassified 
Im age 



Figure 1. The knowledge discovery process for provincing of the ocean floor 

Almost all previous work on the latter approach has been done with supervised 
learning systems where the correct classification is identified in the training data. The 
process of combining output from the different base classifiers may be as simple as a 
majority voting rule, or may use all the training data on all the bases classifiers and 
use the results to train a meta-classifier [4]. This meta-classifier is then used to 
combine the results from base classifiers. Our domain, unsupervised learning, does 
not have the correct answer available to train this meta-categorizer. Additionally, a 
cluster called cluster X in one categorizer may not correspond to cluster X in another 
categorizer. Combining the results from multiple categorizers in unsupervised 
learning requires we match up the clusters by predicting on all categorizers using a 
subset of all the training data and correlating the results. In this paper we describe a 
method we have developed to extend the meta-learning approach of Chan and Stolfo 
[4, 5] to unsupervised learning. We also report some preliminary results obtained by 
using this method to categorize regions in our acoustical images of the ocean floor. 






Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data 87 



2 Combining Categorizers For Unsupervised Learning 

We have developed an approach for adapting Chan and Stolfo's meta-learning 
techniques for use with unsupervised learning. One technique adapted, the use of an 
arbitration rule and arbiter, is presented in Wooley et al. [16]. The technique 
addressed in this paper is based on Chan and Stolfo's meta-learning technique that 
uses a combiner [4] as shown in figure 2. This approach divides the data into P 
disjoint data sets and trains P base classifiers. The role of the combiner is to learn the 
relationship between the predictions made by the P independent base classifiers so it 
can produce an accurate final prediction. Chan and Stolfo [4] present two techniques 
for creating instances to be used in the meta-level training. The first technique is to 
include in each instance the predictions from each of the base classifiers as well as the 
correct classification. This first technique is referred to as class -combiner. The 
second technique is to start with instances as in the class-combiner, and add the 
attribute vector from the raw data that was used to train the base classifiers. This 
second technique is referred to as class -attribute -combiner. 

Adapting Chan and Stolfo's technique to unsupervised learning requires that the 
combiner learn how to associate the category relationships as well as conflicts that 
arise by inconsistent predictions from the base categorizers. The technique we used 
for this paper corresponds to Chan and Stolfo's class -combiner, which uses the output 
of all the base categorizers to generate a new set of feature vectors. These feature 
vectors are then used to train an unsupervised learning algorithm that performs the 
function of the combiner. 

The training phase divides all the training instances (I^ . . into P disjoint data sets 
(D^ . . Dp), which are then distributed to the P processors and used to train the P 
independent categorizers (C^ .. Cp). Each categorizer identifies a certain number of 
categories in the data. The j* category of categorizer C^ is referred to as c^^. A subset 
of the instances (I^) is then selected so that there are instances from all the Dp. Each of 
these Ij^ is processed by all P categorizers producing a new feature vector E^^ = {c^^, c^.,, 
C 3 j,„ ... Cpj,,,,} for each Ij^. To complete the meta-training, this new set of feature vectors 
are trained using an unsupervised learning algorithm (AutoClass) to produce a new 
set of categories (N^ .. NJ. 

To use this system in prediction mode, an instance Ij^ is submitted to each of the 
base categorizers resulting in a new feature vector E^^ = { c^, 03 ^,,, ... Cp^,,,,}. This 

feature vector is submitted to the combiner which predicts which category it belongs 
in (N^ .. Np), which is the final categorization of the instance 1 ^. 

As the number of processors (P) increase, the size of the feature vector also 
increases. The training time required for meta learning is significantly less than for 
the original categorizers. This is because the new feature vector contains a smaller 
number of features, the features are nominal data, there is a large amount of 
duplication in the feature vectors, and each feature has restrictions on the range 
imposed by the maximum number of classes in each base categorizer. 




88 Bruce Wooley et al. 




Figure 2. An arbiter with two classifiers. From [4]. 



3 The Classification Task and Data 

The categorization system we are working with uses the visual texture of acoustic 
images to “province” the ocean floor. The acoustic images (provided by 
NAVOCEANO at S tennis Space Center) were collected from a 100 kHz Chirp Side- 
Scan Sonar using a Data Sonics SIS 1000. Figure 3a gives an example of a portion of 
one of these images. A region-growing process based on the techniques described in 
Reed and Hussong [13] is used to divide each image into irregularly shaped 
homogeneous regions that are the instances to be categorized. A more detailed 
description of our implementation of region-growing is provided in Wooley and 
Smith[15] and Bridges et al.[3]. Four gray-level co-occurrence matrices (GLCMs) 
are computed for each region (one in each of 4 directions). Secondary texture 
statistics are computed from the GLCMs as described in Bridges et al.[3] and 
Karpovich [10]. These statistics are used to form a feature vector for each region 
(instance). AutoClass C was obtained from NASA Ames [12] and used to cluster the 
images. Categorized images are built, which correspond to the original images, and 
the results of the knowledge discovery process are evaluated visually. 



4 The Parallel Processing Environment 

Our objectives in designing the parallel environment for clustering algorithms were 
twofold. First, we wanted to make minimal (or no) changes to the source code of the 
clustering algorithms. Second, we wanted the ability to run each process of a parallel 
clustering algorithm execution on a different dataset in both the learning and 
classification modes. Additionally, we used the Message Passing Interface (MPI) 
library [14], a de facto standard for parallel processing, to write our parallel programs, 
making our system executable in a wide variety of parallel environments. 

Much of the parallelization effort involved writing the programs to distribute data 
and gather results along with some changes to the clustering algorithm. Although 
changes were made to the clustering algorithm (AutoClass) for this parallelization 
process, we are developing a parallel shell that will allow the use of any clustering 





Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data 89 



algorithm without changes. The serial (original) version of AutoClass reads and 
creates files with fixed names, which is problematic since we run several instances of 
AutoClass simultaneously. In order to eliminate the analysis and code modification 
required to make each AutoClass process use files with unique names, we wrote a few 
lines of MPI code to change the working directory of each AutoClass process, based 
on its process index. This allows each process to execute in its own area, without 
possibility of file name collisions. 

The hardware we used for our experiments was an Avalon A12 multicomputer 
consisting of eight DEC Alpha 21 164 A CPUs running at 400MHz [1]. Each 
processor on this distributed memory machine has 256 megabytes of RAM. Inter- 
processor communication is provided via a fully interconnected 14-channel crossbar 
switch, with each channel supporting data-streams of 400 megabytes per second, or 
200 megabytes per second in each direction. 



5 Experiments and Results 

In the experiments described below, the data (feature vectors) were extracted from 12 
acoustical images that were split into two parts to eliminate the shiptrack. Since there 
is no known “correct” class for each of the instances, the classification results from 
the multicomputer runs were visually compared to those from the single processor 
run. Results from single processor runs were evaluated based on information 
provided by geologists at NAVOCEANO. 

The algorithm described above was run on 1 to 8 processors. Instances from the 
training dataset were distributed to the processors. We fixed the number of classes at 
9, which generated good results for single processor experiments performed in the 
past. The training time for the distributed training data is linear as expected, and the 
training time for the meta-learning using 20% of the original data is relatively 
constant, and significantly less than the training time of a base categorizer on 20% of 
the data. This is due to the reduced number of features, the nominal data types in the 
feature vectors, and to the large amount of duplication in the feature vectors. The 
results of prediction for each of the different number of processors are shown for an 
example image in figure 3. 

Eigure 3 contains six images. The first image (figure 3a) is the original sonar image 
of the ocean floor. The other five images are resultant images after applying 
clustering algorithms to categorize the regions of the original image. The label on 
each of the classified images identifies the number of processors (e.g. P3 - three 
processors) that participated in the clustering process. The gray scale shade of each 
image is insignificant. What is important is the patterns identified by the shading and 
how closely they represent the similar texture regions in the original image. Note that 
the patterns from all the images are similar. 

These initial results are very encouraging. We are in the process of refining this 
technique and of developing more objective means of evaluating the results. 




90 Bruce Wooley et al. 



6 Acknowledgements 

This work was sponsored by the Office of Naval Research under grant ONR-N00014- 
96-1276, by Mississippi State University College of Engineering from the Hearin 
Educational Enhancement Eund, and NASA Stennis under grant NAS 1398033-20- 
99010033. 



References 

1. Avalon Computer Systems, Inc. 1998. Avalon Series A12 Parallel 
Supercomputers, http://www.teraflop.com/html/al2.html, accessed May 15, 
1998. 

2. Bradley, P. S., Usama Eayyad, and Cory Reina. 1998. Scaling clustering 
algorithms to large databases. In Proceedings of the Fourth International 
Conference on Knowledge Discovery and Data Mining. Edited by Rakesh 
Agrawal and Paul Stolorz. Menlo Park, CA: AAAI Press. 9-15. 

3. Bridges, Susan, Julia Hodges, Bruce Wooley, Donald Karpovich, George 
Brannon Smith. 1998. Knowledge discovery in an oceanographic database. 
Submitted for publication. 

4. Chan, Philip K., and Salvatore J. Stolfo. 1995. Learning arbiter and combiner 
trees from partitioned data for scaling machine learning. In Proceedings of the 
First International Conference on Knowledge Discovery and Data Mining. 
Edited by Usama Eayyad and Ramasamy Uthurusamy. Menlo Park, CA: AAAI 
Press. 39-44. 

5. Chan, Philip K., and Salvatore J. Stolfo. 1996. Scalable exploratory data mining 
of distributed geoscientific data. In Proceedings of the Second International 
Conference on Knowledge Discovery and Data Mining. Edited by Evangelos 
Simoudis, Jiawei Han and Usama Eayyad. Menlo Park, CA: AAAI Press. 2-7. 

6. Cheeseman, Peter, and John Stutz. 1996. Bayesian classification (AutoClass): 
Theory and results. Advances in Knowledge Discovery and Data Mining. 
Edited by Usama M. Eayyad, Gregory Piatetsky-Shapiro, Padhraic Smyth, and 
Ramasamy Uthurusamy. Menlo Park, CA: AAAI Press. 158-180. 

7. Cheeseman, P. J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Ereeman. 1988. 
AutoClass: A Bayesian classification system. In Proceedings of the Fifth 
International Conference on Machine Learning. Reprinted in Readings in 
Machine Learning, edited by Jude W. Shavlik and Thomas G. Dietterich, San 
Mateo, CA: Morgan Kaufmanns Publishers, Inc. 296-306. 

8 Eayyad, Usama M., Gregory Piatetsky-Shapiro, and Padhraic Smyth. 1996. 
Prom data mining to knowledge discovery: An overview. Advances in 
knowledge discovery and data mining. Edited by Usama M. Eayyad, Gregory 
Piatetsky-Shapiro, Padhraic Smyth, and Ramasamy Uthurusamy. Menlo Park, 
CA: AAAI Press. 1-36. 

9. Hodges, Julia, Susan Bridges, Bruce Wooley, Donald Karpovich, and Brannon 
Smith. 1997. Knowledge Discovery in an Object-Oriented Oceanographic 
Database System. October 21, 1997. Mississippi State University Technical 
Report #971021. 




Scaling the Data Mining Step in Knowledge Discovery Using Oceanographic Data 91 



10. Karpovich, Donald. 1998. Choosing the optimal features and texel sizes in 
image categorization. In Proceedings of the 36th ACM Southeast Conference 
held in Marietta, GA, April 1-3, 1998. 104-107 

11. Livny, Miron, Raghu Ramakrishnan, and Tian Zhang. 1998. Fast density and 
probability estimation using CF-Kernel method for very large databases. 
http://www.cs.wisc.edu/~zhang/birch.html, accessed Oct 1998. 

12. NASA Ames Research Center, Computational Sciences Division. 1998. 
AutoClass C General Information, http://ic-www.arc.nasa.gov/ic/projects/bayes- 
group/autoclass/autoclass-c-program.html, accessed May 15, 1998. 

13. Reed, Thomas Beckett IV, and Donald Hussong. 1989. Digital image 
processing techniques for enhancement and classification of SeaMARC II side 
scan sonar imagery. Journal of Geophysical Research. 94(B6). 7469-7490. 

14. Snir, Marc, Steve W. Otto, Steven Huss-Lederman, David W. Walker, and Jack 
Dongarra. 1996. MPI: The Complete Reference. Cambridge, Massachusetts: 
The MIT Press. 

15. Wooley, Bruce and George Brannon Smith. 1998. Region-growing techniques 
based on texture for provincing the ocean floor. In Proceedings of the 36th 
ACM Southeast Conference held in Marietta, GA, April 1-3, 1998. 99-103. 

16. Wooley, Bruce, Yoginder Dandass, Susan Bridges, Julia Hodges, And Anthony 
Skjellum. 1998. Scalable knowledge discovery from oceanographic data. In 
Intelligent engineering systems through artificial neural networks. Volume 8 
(ANNIE 98). Edited by Cihan H Dagli, Metin Akay, Anna L Buczak, Okan 
Ersoy, and Benito R. Fernandez. New York, NY: ASME Press. 413-24. 




Information Management and Process Improvement 
Using Data Mining Techniques 

Gibbons, WM Ranta, M Scott, TM *, and Mantyla, M ^ 

’ University of Ulster, N.Ireland 
mw . gibbons@ulst .ac.uk 
^Helsinki, University of Technology, Finland 
mervi . ranta@hut . f i 

Abstract. This paper describes a computer component manufacturing scenario 
which concentrates on the application of data mining techniques to improve 
information management and process improvement within a manufacturing 
scenario. The case study involved, relates to an engineering component 
manufacturing company with a consortium of several plant outlets in various 
geographical locations world-wide. Currently, data, information and 
knowledge management, transparency and communication between plants 
within the scenario are limited. As a result, important information and best 
practices are not collectively pooled or conversed with the objective of 
improving information management and process development procedures on a 
global basis. This paper explores the possible enhancement of these consortium 
relationships and it also investigates the use of data mining techniques with the 
potential to advance information management and process improvement within 
the manufacturing environment concerned. 



1 Introduction 

Contemporary organizations are inundated with data, however, they have little 
information, even less knowledge, and perhaps no wisdom. Many companies simply 
hold data in record or archive format, therefore, potentially, valuable information 
hidden within these databases remains untapped [1]. The sheer volume of data held in 
corporate databases, in particular, is already too great for manual analysis and 
understanding, and as the information within them grow, the problems are similarly 
compounded. Data mining techniques possess the potential to enhance process 
improvement, information management and communication within the manufacturing 
environment of this case study, on a global basis. In brief, the current information 
management and process improvement situation within the manufacturing consortium 
is examined and an improved position is recommended. 

2 Computer Component Manufacturing Scenario 

Our case study is based on a large computer component manufacturer that operates in 
several geographical areas. In order to keep our discussion focused and to emphasize 
the characteristics relevant to this research the consortium has been simplified to cover 
just two geographical areas. The case study described in this section involves one of 
the largest global magnetic recording manufacturing industries. This particular 
organization is engaged in the time critical fabrication of data storage products 
worldwide. The problem domain area at the center of the scenario consortium is 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 93-98, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




94 W.M. Gibbons et al. 



outlined succinctly, as the intricacy of the process operations are not within the scope 
of this paper. As an alternative, the focus is directed to the critical areas of 
information management and process improvement and how the application of data 
mining techniques and communication within all plants concerned in the consortium 
can incur long-term strategic benefits. The problem domain area is discussed with this 
perspective in mind. 

The company as a whole includes many different divisions of which we focus on 
only one, the recording head division. This consists of a headquarters and individual 
plants that are located at different areas. At each area the operation is divided into a 
manufacturing and assembly plant. The Headquarters (HQ), that manages and co- 
ordinates the operation. Manufacturing Plant A (MP-A), which manufactures the 
basic components for recording heads for market area A. Assembly Plant A (AP-A), 
which is responsible for assembling components into final products for market area A. 
Manufacturing Plant B (MP-B), that is similar to MP-A except to its location at 
areaB. Assembly Plant B (AP-B), that is similar to AP-A except to its location at 
area B. 




Fig.l. Illustrates the operations within the reeording head division during a eustomer order 
satisfaetion proeess. The thin arrows illustrate the information flow and the thiek arrows are 
used to display the flow of material between plants. 

2.1 Challenges of the Scenario 

The most important challenges found in the investigation can be categorized as 
follows: 

1. The limited co-ordination and management of information, data and knowledge 
among the consortium partners. 










Information Management and Proeess Improvement Using Data Mining Teehniques 95 



2. The deficient transparency and communication of data, information and 
knowledge across areas and between MP and AP’s. 

3. The difficulty of discovering new and relevant manufacturing information and 
knowledge and the problems of managing process control within the 
manufacturing process are quite predominant. 

On investigation, many problems were identified within the current situation of the 
manufacturing consortium, however, the above problems had the most impact, in 
particular, the difficulty of managing process control. 



2.2 Improved Situation 

Solutions to improve the current situation within the manufacturing consortium are 
based on the following factors: improved information management techniques, 
engineering ontologies and the application of data mining techniques. Each of these 
proposed approaches for improvement are outlined in this section and figure 2 
illustrates the recommendations of an improved situation. 




Fig.2. Shows four two -directional gray arrows with the label "A: co-ordination" connecting the 
HQ to all of the MP’s and AP’s. They demonstrate the need for the HQ carry out information 
management on the division level in order to be able to establish and implement efficient 
division strategies. Division wide information management, or even company wide, allows the 
definition and sharing of best practices among the plants. 

Ontologies, product and process modeling and the communication of these will 
improve the communication and transparency of information across areas and between 
plants. Engineering ontologies have the capacity to assist in modeling the 
manufacturing product and process to enable the development of engineering 
ontologies with which to facilitate the communication process between plants and 
across areas to be more comprehensive. Communication of these via the new 
relationships described in the co-ordination and information management problem will 










96 W.M. Gibbons et al. 



permit the transparency and communication of information among the manufacturing 
consortium partners. This may be accomplished by newly developed engineering 
ontologies and terminology’s. Engineering ontolgies can also provide the raw data in 
an acceptable format with which to conduct data mining. Communication can again be 
enforced by HQ and made easy by a data warehouse and suitable data management, 
analysis, retrieval and transfer tools within all plants. However, this is not within the 
scope of this paper [2]. 

Within the manufacturing process, an engineer monitors and analyses the critical 
control parameters within the manufacturing processes that are crucial in determining 
the quality of the final product. Engineers employ these methods to exercise control 
over the process in order to improve it. The process is monitored and analyzed on a 
daily basis as all manufacturing processes are subject to variation changes, causing 
manufactured parts to vary in shape and size. Due to the dynamic nature of magnetic 
recording manufacturing, engineers rarely have sufficient time to uncover all control 
parameters before the process changes. Data mining techniques have the potential to 
improve this situation via automating the search for second order process control 
parameters, provide predictive process control and assist in the discovery of new and 
relevant process information and knowledge. 

This section identified and outlined a number of problems among the relationships 
regarding information and process management operations within the consortium. 
However, it is evident that co-ordination, information management, transparency and 
communication of information are needed primarily in order to discover new 
manufacturing knowledge properly. Ontologies, product and process modeling can 
facilitate this transparency and aid information communication for improved data 
mining within the consortium. This is, however, the basis for further research 
considering the use of engineering ontologies 

3 Discovery of Manufacturing Process Information and 
Knowledge 

This section discusses research currently being conducted onsite within the 
manufacturing consortium involving the implementation of data mining techniques to 
improve the discovery of new and relevant manufacturing process knowledge. The 
first aspect of current research in this area involves the identification of second order 
process control parameters. The problems discussed in section two outline the major 
process control, time delay and yield problems prominent in the magnetic recording 
industry in particular, including the intricacy of the process operations. One such 
operation involves plasma machines, which are widely utilized during critical steps 
within the recording head and semiconductor manufacturing processes. 

The complexity of the physics involved and inability to monitor the plasma 
characteristics on a real-time basis makes the control of this process equipment very 
difficult. Work is currently being carried out, on one of the most critical process steps 
during the manufacture of recording heads. 

Current research has illustrated that a greater level of process control can be 
achieved using a predictive model. This may be accomplished by applying 
multivariate or neural network analysis techniques on process data, including 




Information Management and Proeess Improvement Using Data Mining Teehniques 97 



production lot history data, in-line physical measurement data, real-time equipment 
data, equipment maintenance data and end-of-line product test and final yield data. 
Due to the complexity of the equipment parameters involved and the highly correlated 
nature between them, it was necessary to employ multivariate analysis techniques such 
as Principal Component Analysis (PC A). This is a powerful technique which applies 
data reduction techniques on highly correlated data variables, whilst simplifying the 
analysis process [3]. An in-depth analysis was carried out on over one hundred 
parameters using the Unscrambler software’s (by Camo) PC A technique. 

A predictive model was built using The Unscramblers multivariate analysis 
technique Partial Least Square Regression (PLS) and a feed-forward backpropagation 
three layer neural network [4]. PLS was used to primarily reduce the dimension of the 
data, the results of which were used as input for the neural network. Three different 
models were built and explored individually. In the first model process parameters 
included wafer feature dimensions, photo step process parameters, plasma machine 
data, Measl and Meas2, were used to model and predict the final test results (Ftest). 
Promising results were obtained as depicted by figure 3. 




Fig. 3. Illustrates the Ftest results of the neural network predietive model 



The accurate prediction of the final test results at this stage has incurred two 
significant benefits. Primarily, the neural network modeled results (using ISL’s 
Clementine software) proved to be a superior final predictor to the final test results in 
comparison to Meas2 results, providing a better level of control. Secondly, as a result 
of accurately predicted final test results at Meas2, it has been established that the 
lengthy control loop prior to this step is redundant. 

In the second model, all input parameters except Meas2 are used to model the final 
test results. As a result, as soon as the plasma-processing step was over, it was 
possible to predict the final test results and take corrective action for the next wafer in 
line if required. To date, it was not possible to implement control action for the next 
wafer waiting to be finally tested as equipment performance could only be assessed 
after Meas2 is noted. Results of the second model have also incurred additional 
benefits as it enables control to be taken immediately after the processing step without 
waiting for the Meas2 results, thus reducing the control loop by up to four days. 
Another advantage incurred by this model is that Meas2 and associated process steps 



98 W.M. Gibbons, et al. 



may also be eliminated which would shorten the overall process cycle time and reduce 
line yield losses at these steps. In the third model, modeling is conducted 
immediately after Measl using previous run equipment data. This model demonstrates 
that preventative actions can be taken so that the expensive action of scraping or 
reworking wafers could be avoided, achieving run-to-run control in the process. At 
the moment, this research is in the process of working towards implementing these 
models to be part of the final testing (Ftest) control strategy. 



4 Conclusions and Work in Progress 

This paper provided a case study of a manufacturing consortium. The current 
situation within the consortium entails a manufacturing consortium, which operates on 
quite an autonomous basis and only communicates process information as required by 
their position within the network of consortium relationships. On investigation several 
inherent problems were identified within the current structure and relationships of the 
consortium as a whole. Solutions proposed in this work have presented sufficient 
global communication, exchange of ideas and information management techniques in 
the form of data mining, ontological, product and process modeling methods which 
have the potential to inadvertently impact upon manufacturing process control. Data 
mining procedures currently in progress have also described possible methods of 
enhancing process control and information management between plants to make co- 
ordination, management, transparency and communication of process information, 
more accessible to all consortium partners. 



References 

[1] . Fayyad, U, Piatetsky-Shapiro, Smyth, P & Uthurusamy, R, ''Advances in 

Knowledge Discovery and Data Mining”, AAAI Press, 1996. 

[2] . Mantyla, M., and Ranta, M. (1998) Engineering Process Ontologies for 

Communication, Co-operation, and Co-ordination in a Virtual Enterprise, CD- 
Rom of Prolamat '98, Trento, Italy, September 1998. 

[3] . Fan, H-T & Wu, S, (1995) “Case Studies on Modelling Manufacturing Processes 

Using Artificial Neural Networks”, Journal of Engineering for Industry, 
Vol.ll7,No.3,pp.412-417, 1995. 

[4] . Esbensen, K Midtgard, T, Schonkopf, S (1994) Multivariate Analysis in Practice, 

Camo AS, Trondheim. 




A Comparative Analysis of Search Methods as Applied 
to Shearographic Fringe Modelling 



Paul Clay *, Alan Crispin *, and Sam Crossley ^ 



* Leeds Metropolitan University, School of Engineering 
Calverley Street, Leeds LSI SHE 
a . crispin@lmu .ac.uk 
^AOS Teehnology Ltd. 

46 Pate Road, Melton Mowbray, LEI 3 ORG 
sdc@aost .co.uk 



Abstract. Applieations of shearography in industry inelude the deteetion of 
strain anomalies whieh result when engineering eomponents eontaining defeets 
are subjeeted to stress. The output derived from shearographie apparatus is a 
fringe pattern whieh is used to eonfirm the integrity of, or eharaeterise defeets 
within, the eomponent under test. A step towards the automation of the proeess 
is to eonvert the fringe lines into a mathematieal representation that a eomputer 
ean use for analysis. Modelling ean be aehieved by fitting B-spline eurves to the 
fringe patterns and using a seareh to find a best fit. The paper eompares the 
results of the run time performanee of three seareh methods applied to this 
problem namely; diserete hill-elimbing, random mutation hill-elimbing and 
genetie algorithm. 



1. Industrial Application 



Shearography is a full-field, non-contact optical non-destructive testing (NDT) 
technique which is increasingly being used to assess the integrity of engineering 
components and structures. The technique is based on the detection of surface strain 
anomalies which arise from defects within or below the surface of an object subjected 
to mechanical or thermal load [1]. The technique can be applied to metallic structures 
but is particularly suited to composite, non-metallic materials, whose non- 
homogeneity makes inspection by other, more conventional techniques, such as 
ultrasonic testing, difficult, time-consuming and potentially unreliable. Commercial 
deployment of shearography systems includes the inspection of E3 (AW ACS) 
radomes (USAF and RAF), helicopter rotor blades (Aerospatiale), marine hulls 
(RNLI) and automotive components (e.g. Rover group) [2]. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 99-108, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




100 Paul Clay etal. 



The technique involves illuminating a test piece with an expanded laser beam. 
Reflected light from the surface of the test piece is focused onto the image plane of a 
shearographic camera, which is interfaced to a computer. The shearographic camera 
typically consists of a Michelson interferometer placed in front of the lens of a CCD 
camera. The interferometer produces a pair of laterally sheared images that interfere 
with one another to form a speckle pattern. When the test piece is stressed the 
intensity distribution of the speckle pattern is changed. In the so-called ‘double 
exposure method’ two images are captured before and after stressing the sample [3]. 
The pixel by pixel subtraction between the two images yields shearographic fringes 
such as those shown in Fig. 1(a) which are speckled in nature. 




(a) 



(b) 



Fig. 1. (a) Raw shearographic image for pipe (b) fringe search performed on filtered image 



The shearographic image of Fig. 1(a) has been taken from tests on a metallic pipe 
subject to thermal load. If stress is even over the field-of-view of the pipe then the 
observed fringe lines are parallel and even along the direction of the pipe. Weaknesses 
cause increased stress in local areas and this produces a non-uniform fringe pattern 
such as pinching. The image of Fig. 1(b) is pre-processed using a 25x25 pixel sliding 
block averaging filter to reduce speckle noise with histogram equalisation used to 
enhance contrast prior to applying a fringe search. 

At present the interpretation of shearographic images is a manual process requiring 
the presence of a skilled operator. Experts examine images of pipes for pinching to 
diagnose areas of stress anomalies. A step towards the automation of the process is to 
convert the fringe lines into a mathematical representation that a computer can use for 
analysis. This can be achieved by fitting parametric B-spline curves to each of the 
fi*inge lines. This paper discusses the implementation and compares the run time 
performance of three search methods for finding best fit curves to shearographic 
fi*inge lines. 



A Comparative Analysis of Search Methods 101 



2. Parametric Modeling of Fringe Lines 

The study has focussed on modeling fringe lines using B-spline curves [4]. A single 
curve can be generated from quadratic B-spline curve segments with each segment 
defined by three control points. Parametric curve segments start and end on knot 
points which are the mid positions between the control points. 

Quadratic B-spline curves are generated by multiplying an approximating function, 
which is expressed in terms of a parametric parameter u scaled between 0 and 1, by a 
matrix which contains a subset of the control points using the equation below. 



\{X-uf 


— 2u^ + 2u + 1 




■ Pi ■ 

P , 


2 


2 


2 


^i+\ 

Pm_ 



( 1 ) 

Here i varies from i=l to i=N-2 with N being the number of control points. P(u) = 
[x(u) y(u)] is a generated point on the B-spline curve in the interval 0 < u <1. 

The X values of the initial and final control points are determined by positioning 
knots on the edges of the image. These are allowed to move with one degree of 
freedom (i.e. along the y-axis). Intermediate control points (i.e. those between the end 
knots) each have two degrees of freedom being allowed to move along both the x and 
y axis to any co-ordinate position bounded by the image plane. 



3. Objective Function 

It is necessary to calculate a fitness value which determines how well a given B-spline 
curve fits a fringe. The characteristics of a dark fringe which allow it to be described 
in terms of an objective function are: (i) it is a low intensity area compared to regions 
either side (ii) points along a fringe should follow a minimum intensity gradient path 
and (iii) no two fringes cross. A fitness value can be calculated by finding a local 
neighbourhood intensity error value e(x,y) around each point along the curve and 
averaging these values to calculate a total intensity error for a curve generated by a set 
of control points. A local neighbourhood intensity value is calculated as the sum of the 
element-by-element multiplication of a gaussian mask (which aids in centering the line 
midway between higher intensity areas) and the corresponding image intensity 
elements around the current pixel. This results in a weighted mean value for the 
intensity error e(x,y) at each point along the curve. The local point error is calculated 
using: 




102 Paul Clay etal. 



e(x,y) 



i=(a)'t j=(a)'t 

Z Z 



i={-a)1' j=i-a)^ 






( 2 ) 

where T signifies rounding up to the next integer. Mg is the mask value at point (i,j), 
where Mo,o is at the centre of the mask, I(x+i)(y+j) is the intensity value at point (x+i, 
y+j), b is the length of the mask block side, and a = (b-l)/2 

The fitness value (line error) can be calculated as the mean value of the local 
intensity errors at all pixel points along a curve: 



N 

^e(x(n), y(n)) 

n=\ 

N 



(3) 

where N is the total number of points along the curve. The mean intensity error 
calculated for a curve always lies between 0 (all black pixels) and 1 (all white pixels). 
A low mean error value represents a curve having a good fit to a fringe intensity area. 

It has been found that two other terms appended to Equation 3 are required to 
prevent undesirable solutions. The first of these is the variance which provides a 
measure of the dispersion of e(x,y) from the mean and so penalises curve solutions 
which cross between two or more fringes as these will have higher variance than a 
curve occupying a single intensity area. The second additional term checks that 
control points are in sequential order and prevents curve solutions which loop back on 
themselves as loops do not occur in this case. 



4. Search Methods 

A search is required to find a best fit curve for a fringe line in terms of a minimum 
intensity error as calculated by the objective function. Three main types of search 
method can be identified, namely; exhaustive, calculus-based and random [5]. 

An exhaustive search looks at objective function values at every point in the search 
space, one at a time. They are usually discounted due to lack of efficiency. In this 
problem, the total number of search permutations for a 512x512 pixel image and a 
single B-spline curve generated using two intermediate control points each with two 
degrees of freedom and two end points each with one degree of freedom is 512^. 




A Comparative Analysis of Search Methods 1 03 



For N fringe lines the total number of permutations becomes Nx512^. Searching this 
number of combinations without direction is not efficient. 

Calculus based methods generally perform a search in a direction related to the 
gradient of the objective function. This can be compared to hill-climbing as the search 
proceeds in the steepest permissible direction (i.e. downhill to find a minimum). 
Discrete hill climbing has been studied extensively and is often used as a basis search 
for comparison purposes. 

There are several types of random search algorithm. Amongst the most reported are 
genetic algorithms (GAs) which attempt to find the global minimum (or alternatively a 
maximum) using heuristics based on a model which attempts to mimic the mechanics 
of natural biological evolution [5], [6]. They operate on a population of potential 
solutions stochastically choosing and using the fittest solutions to produce more 
generations of potential solutions which it is hoped will yield better approximations to 
the final optimal solution. Another contrasting type of random search algorithm is the 
random mutation hill climbing method (RMHC) [6]. This has a random change 
element but iterates from a single solution rather than a population of solutions as used 
by the genetic algorithm. 

The implementation and comparison of a discrete hill climbing (steepest descent) 
search, random mutation hill climbing search and a genetic search are discussed 
below. In each case once a fringe line is found the error space surrounding it is altered 
by zoning with a high error value (see Fig. 1(b)). This forces a search for another 
fringe curve once one has been found. The search algorithm is not looking for a global 
minimum but a set of local minima (low areas of intensity representing fringe lines). 



4.1 Discrete Hill Climbing 

The steepest descent hill climbing search is implemented using the following 

algorithm steps: 

1 . choose a set of control points at random 

2. set as current curve and calculate its intensity error 

3. move each control point by one pixel in all degrees of freedom and calculate the 
intensity error for each curve 

4. if a new curve has a better intensity error than current curve then set as current 
curve and repeat step (3) 

5. if none of the new curves has a better intensity error than the current curve then the 
current run has ended 

6. if either no best curve has been saved (first pass) or the current curve at the end of 
run in step (5) has a better error than the stored best error curve, then set the current 
curve as the best curve and repeat from step (1) until either the error goal or a 
maximum number of iterations is achieved 




104 Paul Clay etal. 



7. if current curve from end of run step (5) is not better than saved best curve then 
discard current curve and repeat from step (1) until either error goal or the 
maximum number of iterations is achieved. 

4.2 Random Mutation Hill Climbing 

The random mutation hill climbing search is implemented using the following 

algorithm steps: 

1 . randomly generate a set of curves and choose the curve with the lowest error (seed) 
and store it as the initial curve with the best error 

2. choose a single control point at random 

3. if it is an intermediate control point randomly generate new x,y co-ordinates 

4. if it is an end point randomly generate new y co-ordinate 

5. calculate the intensity error of the new line 

6. if the intensity error of the new line is less than the current best error then store the 
new line and update the best error 

7. repeat from step (2) until either the error goal or the maximum number of iterations 
is achieved 



4.3 Genetic Algorithm 

The genetic search is implemented using the following algorithm steps [5]: 

1. create a population of curves by generating a population of control points at 
random positions 

2. find the intensity error value for each curve (fitness value) 

3. stochastically choose a number of pairs of curves 

4. randomly combine the control points within each pair of curves (recombination) 

5. replace curves with worst error in the previous population with newly created 
curves (elitist reinsertion) 

6. repeat from step (2) until either the error goal or the maximum number of iterations 
(generations) is achieved. 

In genetic algorithm terminology the implementation above is referred to as a 
simple genetic algorithm (SGA). Analogies can be made between (i) a gene and a 
single control point and (ii) a whole string of control points representing a curve and a 
chromosome. The number of chromosomes (i.e. potential curve solutions) is called the 
population. Each potential curve solution is assigned a fitness (line intensity error) 
value. 

Curves are sorted in rising order of error values with lowest error curves at the top 
of the list. Pairs of curves (parents) are randomly selected with a greater sampling 
probability given to low error curves than high error curves. An offspring curve is 
created from each pair of parents. The number of offspring generated is a percentage 




A Comparative Analysis of Search Methods 1 05 



of the total population size (recombination percentage). To create a single offspring 
curve, corresponding x and y control point values from two selected curves are 
recombined along a line joining pairs of points. A mutation operator is also employed 
at every generation which creates a new set of random points for a percentage of the 
offspring to prevent permanent fixation at a particular local minimum. 



5. Results and Discussion 

Repeated testing of each search algorithm has been performed using the image shown 
in Fig. 1(b) with each run terminating on either reaching an error goal or a maximum 
time of 30 minutes. All search algorithms use the same objective function as discussed 
in Section 3 and an error goal of 0.4 which was determined sufficient to describe the 
path of a fringe while being realistically achievable with the time limit. A comparison 
of the mean, median and minimum search times together with the number of times a 
fringe search exceeds the time limit of 30 minutes is shown in Table 1. The results are 
compiled from 50 repeated trials representing 150 fringe fits per algorithm (450 total 
trials) for the test image. These results have been used to plot the search time 
histograms for each method shown in Fig 2. The bin size is 120 seconds. 



Table 1. Comparison of mean, median, and minimum seareh times for fitting fringes with an 
error goal of 0.4. These raw results have units of seeonds and were obtained using Matlab on a 
Pentium II PC running at 300MHz. Also ealeulated is the pereentage of searehes that exeeed a 
maximum time limit set at 30 minutes. 





Hill Climbing 
Isl 


Random Mutation 
Hill Climbing [s] 


Genetic Algorithm 
[si 


Mean search time 


484 


51 


74 


Median search time 


427 


36 


58 


Minimum search 

time 


1.7 


32 


36 


Searches exceeding 
maximum time 


14% 


0.7% 


0.7% 



The experimental results show that both the random mutation hill climbing 
algorithm and the genetic algorithm improve the search time performance compared 
with the hill climbing method. It is instructive to compare the strategies adopted by 
each search method in order to gain an understanding of why one method performs 
better than another in this application. 






106 



Paul Clay et al. 




60 180 300 420 540 660 780 900 1020 1140 1260 1380 1500 1620 1740 



Time (s) 



Fig. 2. Histogram of search times 

The hill climbing method works from a single curve until a best local optimum is 
found. The curve is stored if it is the best found so far, but the components (i.e. good 
genes in a GA) that make the curve better than the others are ignored and not used in 
finding the next starting point. The method has no memory of good components. The 
fact that a set of control points are chosen at random each time the search starts means 
that the method is not restricted when finding good solutions by ridges in the error 
space. 

The RMHC method works on a single curve from start to finish. It always keeps the 
components that make a curve a good fit and adjusts these individually to reduce the 
error value of the curve. Therefore, unlike the traditional hill climbing method, it 
memorizes good components of a solution and run time performance is improved. One 
problem with the method is that ridges in the error space restrict the movement of the 
solution curve. To cross a ridge one component (i.e. control point) at a time would 
need to increase the error value of the curve which is against the rules of the 
algorithm. To reduce the chance of getting stuck between two ridges an initial 
population of 150 seed curves are randomly distributed over the image from which the 
curve with the lowest error is chosen. Reducing this population size decreases the 
minimum search time but increases the probability of not finding a suitable seed 
curve. The RMHC algorithm presented here should be considered as a hybrid method 
since it combines the use of an initial population and a random search. 



A Comparative Analysis of Search Methods 1 07 



The genetic algorithm works on a large number of curves and keeps a variety of 
components including the good control points to build on. It is not affected by ridges 
as a population of points are kept and random mutation allows free movement of 
control points throughout the image. A large population size gives a greater variety in 
the gene pool while a low recombination percentage reduces the computational 
overhead involved with calculating large numbers of error values. Choosing a 
recombination percentage of 30% with a population size of 150 and a mutation rate 
of 5% provided the best performance with the coding method used. 

The method of searching for one fringe after another ignores a parallel method 
whereby a genetic algorithm uses sub-populations to search for all fringe lines 
simultaneously. It was found that when using sub-populations the genetic algorithm 
evolves a disproportionate number of curves that cross one another which are 
physically unrealistic solutions as fringes do not overlap. When additional error 
testing was added to prevent this from occurring it introduced an additional time 
penalty. The sequential fringe line search with penalty zoning prevents solutions 
whereby curves cross one another and allows a direct comparison to be made between 
the different search methods as the same objective function can be used by each 
method. 

6. Conclusion 

The paper has discussed the implementation and comparison of three search methods 
for fitting B -spline curves to filtered shearo graphic fringe patterns with error goal and 
maximum search time parameter settings. It has been found that both the random 
mutation hill climbing and the genetic search improve search time performance as 
compared to hill climbing. The parametric modelling approach fits a line to the 
general trend of a fringe which is adequate to describe the specific type of anomaly 
(pinching of fringe lines in pipes) that this method was developed to search for. Small 
scale anomalies have not been considered as they have a limited effect on the integrity 
of a pipe in comparison to those that exhibit a large scale stress gradient. This 
approach effectively compresses the relevant information in a shearographic fringe 
image so that it can be stored as a set of control points. The image compression 
properties of this approach would be especially significant for high resolution systems 
(e.g. 2048x2048). 



References 

1. Jones R. & Wykes C. (1989) Holographic and Speckle Interferometry^ 2“"^ 
Edition, Cambridge University Press, ISBN 0-521-34417-4 

2. Chambard, J. P. Colon, E. Smiegielski, P. (1997) Applications of holographic 
and speckle interferometry in industry. Fringe 97 Conference on the automatic 
processing of fringe patterns, Bremen, Academic Verlag Series on Optical 
Metrology, pp 520-523. 




108 Paul Clay etal. 



3. Steinchen, W. Yang, L. X. Schuth, M. (1996) TV-shearography for measuring 
3D-strains, Strain, May, pp 49-57. 

4. Dewey, B. R. (1988) Computer Graphics for Engineers, Harper & Row 
Publishers, ISBN 0-06-04 1670-X. 

5. Goldberg D.E. (1989) Genetic Algorithms in Search, Optimisation and Machine 
Learnings Addison Wesley Longman Inc. ISBN 0-201-15767-5. 

6. Mitchell, M. (1996) An Introduction to Genetic Algorithms, Massachusetts 
Institute of Technology Press, ISBN 0-262-13316-4 




Vision Guided Bin Picking and Mounting in a 
Flexible Assembly Cell 



Martin Berger, Gernot Bachler, and Stefan Scherer 

Computer Graphics and Vision, Graz University of Technology 
{berger , bachler , scherer }@icg . tu-graz . ac . at 



Abstract. In this contribution a vision system for the flexible assem- 
bling of industrial parts is presented. A new three step approach is de- 
scribed. It consists of three independent vision guided modules. The pick- 
ing module allows to pick objects from an unorganized heap or out of 
a bin, the pose determination module delivers the exact position of the 
isolated object and the surveillance module allows to verify the success of 
mounting the parts. This allows all the system stages to consist of stan- 
dard components, while ensuring a high degree of flexibility, adaptability 
and robustness. Successful! results achieved with a prototype system im- 
plemented at our industrial cooperating partner are presented. 

Keywords: Bin Picking, Industrial Assembling, Vision, CAD Model Fit- 
ting 



1 Introduction 

The task of picking objects from an unorganized bin for industrial production 
purposes has been a great challenge in vision and automation research over the 
past ten years [16,17,1,12]. The goal was to enable a robot arm to pick some 
parts out of a bin without any knowledge of their shape or position. As in many 
cases the pieces to be assembled do not come in a bin or are well organized in 
appropriate containers, this topic became less significant because of the missing 
prospective to come into operation. Nevertheless for rather small series (typi- 
cally thousands of pieces) it becomes very expensive to adapt a production line 
to new parts. Thus, a device being able to manage the stated task, would help to 
save money and time. Simplifying the hardware, using standard components in- 
stead of special ones, enclosing modules of intelligent vision software is what we 
propose in the present contribution. Previous approaches tried to solve the task 
of picking and mounting in one step, yielding complex systems and very sophis- 
ticated algorithms, while mainly suffering from flexibility. We introduce a novel 
three-step concept consisting of bin picking, pose determination and mounting 
surveillance. In the first step no knowledge about the parts is needed, step two 
and three will suppose given simplified GAD models. The work was carried out 
in cooperation with local industrial partners in the field of automation, therefore 
another goal is to ensure the real time capability of the developed algorithms. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 109-117, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



110 Martin Berger et al. 



The paper is organized as follows. Sect. 2 describes previous and related re- 
search, Sect. 3 introduces the new approach in detail. The system modules are 
described in Sect. 4 and Sect. 5. A description of the working environment is 
given in Sect. 6, together with some practical application remarks. Sect. 7 con- 
cludes the paper and gives a brief outlook to future work. 



2 Related Work 

The problem of picking parts from a heap was addressed by various authors. 
Rahardja and Kosaka [16] extract simple features such as circular and polygonal 
parts from stereo images representing complex objects. Their system incorpo- 
rates both object identification and pose determination in one single step. 

The work of Ikeuchi [12] is based on a CAD representation of an object which 
allows to build an interpretation tree offline and to select the optimal features at 
each determining process. The algorithms do not emphasize any type of back- 
tracking. 

Trobina and Leonardis [17] present an object-grasping system based on range 
images. They detect antipodal planar patches where a robot arm can apply 
with a gripper. The information is extracted piecewise by a Recover- and- Select 
paradigm. After the selection of the best grasping hypothesis, an object is re- 
moved from the pile. No investigations about the precision of grasping or the 
assembling capability of the system are presented. 

Al-Hujazi and Sood[l] propose to determine grip points for a vacuum gripper 
from dense range images. They detect edges by the residual analysis, segment 
the (single) object in an appropriate number of surface patches and then calcu- 
late a grip point. 

Pose determination of objects in 3D was considered as a mandatory byproduct 
of object recognition systems. The task of determining the pose of objects in 3D 
by fitting some (even parametric) CAD models to images was first addressed by 
Lowe [14], based on some work on perceptual organization and grouping [13]. The 
algorithm refines the 3D pose parameters according to the observed errors in a 2D 
image. Araujo et al. [2] proposed extensions to Lowe’s algorithm. Other authors 
addressed the mathematical analysis of the ’point cloud to point cloud’ match- 
ing [9] and mapping in presence of noise given the correspondences [11,10]. Other 
known recognition (and therefore pose determination) systems in a calibrated en- 
vironment are ACRONYM (Brooks [7]), HYPER (Ayache and Faugeras [3]) and 
SCERPO (Lowe [13]). In the paper which introduced the RANSAC^ paradigm, 
Fischler and Bolles [8] also presented an early recognition system. The establish- 
ment of distance metrics for the alignment of models to images was addressed 
by Wineshall and Basri [19]. 

^ Random Sample Consensus 



Vision Guided Bin Picking and Mounting in a Flexible Assembly Cell 



111 



3 A Three-Step Concept 

In contrast to the stated approaches, we split the complex task of flexible auto- 
mated assembling into three consecutive subtasks. They are almost indepedent 
among one another, to guarantee highest flexibility and robustness together with 
easy adaptability. The three stages of our approach are: 

— Bin Picking. To enable highest flexibilty in picking a wide variety of objects, 
we try to identify planes on a heap of unorganized parts, where a vacuum 
gripper can apply. A stereo setup together with a grid projector is placed 
over the working region. The robot (or an appropriate manipulator) picks 
an object identified to lie on top and drops it on a separate workplace. The 
algorithm is based on a structured light approach combining shape analysis 
and stereo matching. Note that no further assumptions on the shape or on 
the pose of the objects are made. 

— Pose Determination. Once an object is separated from an unorganized 
heap, its exact pose is determined, now including some knowledge about it. 
The first submodule calculates a pose guess, the second fits a simple CAD 
model iteratively to image features. It is also possible to reject an object if it 
does not belong to the a given class or if it exceeds some tolerances in shape. 
A manipulator equipped with a grasp tool can pick up the piece suitably 
and mount it. 

— Assembly Inspection. The third stage of our approach ensures the correct 
mounting of the part and detects failures. This step is done in traditional 
way for now and is subject to future research. 

The following sections will describe the bin picking and the pose determination 
module in detail. 

4 Plane Detection, Grip Point Selection and Picking 

This section will outline the image processing steps to robustly detect planar 
surfaces of parts on a heap and calculate their location in 3D in order to perform 
the picking. Fig. 1 shows a grayscale image pair of a heap of chain parts and a 
regular, high contrast grid projected on it. 

The grid is segmented with a local threshold method and morphologically 
thinned to obtain a grid skeleton, which is analyzed and stored in a locally 
perspective projection distortion invariant representation, as can be seen from 
Fig. 2. Adjacent intersections are represented by accordant adjacent points in 
the plane representation (see [4] for details). Connected intersections are called 
plane candidates. 

Next the plane candidates from the left and the right image are matched, 
missing intersections are detected and false matches are rejected. Once corre- 
spondences have been established, the subpixel positions of the intersections 
are calculated from the grayscale images, starting from the position achieved 
from the segmented and skeletonized grid. Then the gridpoints are reconstructed 



112 



Martin Berger et al. 




Fig. 1. An image pair showing a heap of unorganized parts with a high contrast 
grid projected on it. 



in 3D, a grippoint is selected on the plane and a normal vector is calculated. 
A selection procedure is applied to establish the topmost largest plane. Their 
coordinates and normal vector are passed to the control of the robot arm [5]. 
Fig. 3 (b) shows a robot picking one of the chain parts from Fig. 1. 



5 Pose Determination of Isolated Objects 

This section presents the second module of the assembly cell. After picking an 
object, the robot places it on a separate workplace and a high resolution camera 
grabs an image. The pose determination module calculates a pose and passes 
those points to the robot where the object can be grasped for mounting. The 
pose is determined by fitting a CAD model to image edge features using an 
iterative refinement procedure. To ensure fast convergence, a rather precise initial 
guess of the pose must be given. A very powerful approach for obtaining a 
coarse pose (we will call it pose estimation) is the Parametric Eigenspace [15]. 




Fig. 2. Each intersection of a plane is represented as a pixel in the perspective in- 
variant plane representation. Segmented plane from (a) left view, (b) right view, 
(c) Matching results for (a)-(b), intersection points with no correspondence are 
marked black. 





Vision Guided Bin Picking and Mounting in a Flexible Assembly Cell 



113 




Fig. 3. (Left) The experimental setup consisting of a robot arm (1), a stereo rig 
(2), a grid projector (3) and a workplace (4). (Right) The robot picking a part 
from a heap of unorganized objects. 



This PCA^-based method deals with the appearance of objects depending on 
the viewpoint parameters. In the present case a one-parametric description was 
chosen to obtain a representation of each stable pose (see Fig. 4) sampled at 
a 10-degree resolution. The eigenspace representation is generated from these 
images in an offline learning phase. See [6] for details. 




Stable Pose 1 Stable Pose 2 Stable Pose 3 Stable Pose 4 

Fig. 4. Four stable poses of an industrial part. 



In the online pose estimation, this method provides the stable pose and the 
approximate rotation angle cj) of the inspected object around the vertical axis. 
Since the parametric eigenspace is an object centered method, there is no a-priori 
information available on the translation on the workplace. This drawback can be 
eliminated by virtually placing the CAD model on the workplace (see Fig. 5 (a)), 
rotating it by the angle (p from the eigenspace around its vertical axis (Fig. 5 (b)) 
and translating it on the image of the object (Fig. 5 (c)). From the translation 
in the image, which can be calculated by fitting the bounding rectangles of the 

^ Principal Component Analysis 







114 Martin Berger et al. 




(c) (d) 



Fig. 5. The steps performed by the pose determination module visualized by 
superimposing the model projection on the image. The white dot indicates the 
center of gravity in 3D. (a) The CAD model placed on the center of the work- 
place, (b) rotation corrected, (c) rotation and translation corrected and (d) 
fitted. Note the difference on the left side of the object before (c) and after (d) 
the model fitting process. 



projected model and the image of the object, the translation V on the workplace 
is estimated. Integrating the rotation angle 0, the translation vector V and the 
calibration data results in a pose estimate in 3D. The subsequent fitting step 
(we call it pose determination) starts from this initial guess. An adapted hidden 
line algorithm allows to calculate the line clippings in 3D world coordinates, 
which is essential for the 2D-3D feature correspondence and the 3D parameter 
refinement. Note that we are given only three degrees of freedom instead of 
six. Because of the ’’Ground Plane Constraint” (see for example [2]) they are 
limited to the rotation angle (j) and the two dimensional shift vector V. Some 
error measures are necessary for the pose parameter assessment. We define them 





Vision Guided Bin Picking and Mounting in a Flexible Assembly Cell 



115 



as the perpendicular distances between model edges and image gradients, as 
suggested by Lowe [14]. From this error vectors in 2D we can iteratively refine 
the pose parameters by solving the equation 



AV, 


= J- 


/ei\ 

^2 

ea 


AVyj 




1 ^ / 



where J is the Jacobian of the 3D-2D mapping function and are the observed 
errors in the image. 

6 Practical Results 

In this section experimental results are presented. Fig. 3 (a) shows the industrial 
setup - a prototype of the assembly cell. There are three standard digital cameras, 
two of them for the bin picking and one for the pose determination. A common 
slide projector is used to texture the observed scene with the high contrast 
grid. The whole system is calibrated with the well-known Tsai method [18]. For 
learning the stable pose views in their various angular positions we used a turn 
table. In the current implementation the robot picks an object from a heap and 
places it on a separate workplace. After the execution of the pose determination 
module, the robot picks the object again and due to a now known orientation, 
pokes it in a fit. 



3D Plane Reconstruction 


# 


% 


Detected 


2190 


100,00 


Correct 


2134 


97,44 


Incorrect 


56 


2,56 


Not detected 


190 


8,68 


Reconstructed 


2134 


97,44 


Picked 


2026 


92,51 


Missed 


108 


4,93 



Table 1. Summary of the Bin Picking module; results from various test series. 



The picking module has more than 92% success rate, which is promising for 
industrial applicability. Table 1 summarizes the results of several test series. The 
pose estimation (i.e. the geometry integrated eigenspace method) delivers pose 
parameters corrupted by an error of approximately ±3 degrees in rotation and 
less than 2 mm in both translation directions. This error is decreased signifi- 
cantly by the subsequent fitting, typically under half a degree in rotation and 



116 



Martin Berger et al. 



Pose Determination Error 


\\m 


lAUll 


lAUII 


Average 


0.30^ 


0.41 mm 


0.37 mm 


Standard Deviation 


0.21^ 


0.19 mm 


0.15 mm 


Maximum 


4.23^ 


1.43 mm 


1.18 mm 


Minimum 


0.01^ 


0.02 mm 


0.01 mm 



Table 2. Evaluation of the pose determination module. 



half a mm in translation. Table 2 lists the errors for each calculated pose param- 
eter. 

The presented system runs completely on standard hardware components, which 
is a direct consequence of the three-step concept. All of the presented algorithms 
perform in real time. Despite this, there are still some possibilities left for in- 
creasing the speed. 

7 Conclusion and Future Work 

A novel approach to the problem of picking objects from a bin and mounting 
them was presented. The task is splitted in three independent subtasks. Plane 
detection is used to identify grip points on the parts of the unorganized heap. 
The matching and the stereo correspondence retrieval are computed by a novel 
structured light method, which combines them in one single processing step. 
After the isolation of a single part, the second module identifies its pose and 
enables a manipulator to grasp it precisely. The system uses an geometry in- 
tegrated eigenspace approach to estimate the pose of the object. This initial 
guess is refined by an image feature based model fitting. Test series proved the 
applicability of the stated three-step concept and the described algorithms for 
industrial environments. 

Future work will include grid analysis algorithms such as identification of sin- 
gularities, correspondence refinement and working height estimation. The pose 
estimation module will be completed by some fitting strategies. All parts of the 
system implemented until now are under heavy testing in an industrial environ- 
ment. 



References 

1. Al-Hujazi, E., and Sood, A. Range Image Segmentation with Applications to 
Robot Bin-Picking Using a Vacuum Gripper. IEEE Transactions on Systems, Man 
and Cybernetics 20, 6 (November/December 1990), 1313-1324. 109, 110 

2. Araujo, H., Carceroni, R. L., and Brown, C. M. A Fully Perspective Formu- 
lation to Improve the Accuracy of Lowe’s Pose-Estimation Algorithm. Computer 
Vision and Image Understanding 70, 2 (1998), 227-238. 110, 114 



Vision Guided Bin Picking and Mounting in a Flexible Assembly Cell 



117 



3. Ayache, N., and Faugeras, O. D. HYPER: A new approach for the recogni- 
tion and positioning of two-dimensional objects. IEEE Transactions on Pattern 
Analysis and Machine Intelligence 8, 1 (1986), 44-54. 110 

4. Bachler, G., Berger, M., Rohrer, R., Sgherer, S., and Pinz, A. A Vision 
Driven Automatic Assembly Unit: Robust Bin Picking. In Proceedings of the 23rd 
Workshop of the AAPR (Steyr, Austria, May 1999), pp. 205-213. Ill 

5. Baghler, G., Berger, M., Rohrer, R., Sgherer, S., and Pinz, A. A Vision 
Driven Automatic Assembly Unit. In Proceedings of the 8th Conference on Com- 
puter Analysis of Images and Patterns (Ljubljana, September 1999), pp. 375-382. 
112 

6. Berger, M., Baghler, G., Sgherer, S., and Pinz, A. A Vision Driven Auto- 
matic Assembly Unit: Pose Determination from a Single Image. In Proceedings of 
the 23rd Workshop of the AAPR (Steyr, Austria, May 1999), pp. 215-224. 113 

7. Brooks, R. A. Symbolic Reasoning around 3-D Models and 2D-Images. Artificial 
Intelligence 17 (1981), 285-348. 110 

8. Fisghler, M. a., and Bolles, R. C. Random Sample Consensus: A Paradigm for 
Model Fitting with Applications to Image Analysis and Automated Cartography. 
Communications of the ACM 2f^ 6 (June 1981), 381-395. 110 

9. Gold, S., Rangarajan, A., Lu, C.-P., Pappu, S., and Mjolsness, E. New 
Algorithms for 2D and 3D Point Matching: Pose Estimation and Correspondence. 
Pattern Recognition 31, 8 (1998), 1019-1031. 110 

10. Haraligk, R. M., Joo, H., Lee, C., Zhuang, X., Vaidya, V. G., and Kim, 
M. B. Pose estimation from corresponding point data. IEEE Transactions on 
Systems, Man and Cybernetics 19, 6 (Nov. /Dec. 1989), 1426-1446. 110 

11. Haraligk, R. M., nan Lee, C., Ottenberg, K., and Nolle, M. Analysis and 
Solution of The Three Point Perspective Pose Estimation Problem. In Proceedings 
of IEEE Conference on Computer Vision and Pattern Recognition (1991), pp. 592- 
598. 110 

12. Ikeughi, K. Generating an Interpretation Tree from a CAD Model for 3D-Object 
Recognition in Bin-Picking Tasks. International Journal of Computer Vision 
(1987), 145-165. 109, 110 

13. Lowe, D. G. Three-Dimensional Object Recognition from Single Two- 
Dimensional Images. Artificial Intelligence 31, 3 (March 1987), 355-395. 110 

14. Lowe, D. G. Fitting parameterized three-dimensional models to images. IEEE 
Transactions on Pattern Analysis and Machine Intelligence 13, h (May 1991), 441- 
450. no, 115 

15. Murase, H., and Nayar, S. K. Visual Learning and Recognition of 3-D Objects 
from Appearance. International Journal of Computer Vision H (1995), 5-24. 112 

16. Rahardja, K., and Kosaka, A. Vision-Based Bin-Picking: Recognition and Lo- 
calization of Multiple Complex Objects Using Simple Visual Cues. In Proceedings 
of the International Conference on Intelligent Robotics and Systems (Osaka, Japan, 
November 1996), IEEE / RSJ. 109, 110 

17. Trobina, M., Leonardis, A., and Ade, E. Grasping Arbitrarily Shaped Objects. 
In Mustererkennung (Wien, 1994), W. G. Kropatsch and H. Bischof, Eds., vol. 1, 
Technische Universitat Wien, Xpress, pp. 126-134. 109, 110 

18. Tsai, R. Y. A Versatile Camera Calibration Technique for High-Accuracy 3D 
Machine Vision Metrology Using Off-the-Shelf TV Cameras and Lenses. IEEE 
Journal of Robotics and Automation RA-3, 4 (August 1987), 323-344. 115 

19. Weinshall, D., and Basri, R. Distance metric between 3d models and 2d images 
for recognition and classification. IEEE Transactions on Pattern Analysis and 
Machine Intelligence 18, A (April 1996), 465-470. 110 



A Brokering Algorithm for Cost & QoS-Based Winner 
Determination in Combinatorial Auctions 



Aneurin M. Easwaran and Jeremy Pitt 

Intelligent and Interaetive Systems, Department of Eleetrieal & Eleetronie Engineering 
Imperial College of Seienee, Teehnology & Medieine 
Exhibition Road, London SW7 2BZ, England 
{a . easwaran, j .pitt }@ic . ac . uk 



Abstract. Deregulation of teleeommunieations has meant an inerease in third- 
party serviee provision, personalized serviee delivery and integrated networks 
and media. The effieient alloeation of serviees, without human intervention, to 
satisfy advaneed serviee requirements spanning several networks is a erueial 
task. This ean be modeled as a winner determination problem in eombinatorial 
auetions where there are multiple serviees, serviee providers and winner deter- 
mination eriteria (like eost, bandwidth, delay, ete) but we have shown the 
problem is NP-eomplete. This paper deseribes a new two-stage algorithm for 
optimal anytime winner determination. In the first stage, a hierarehieal task 
network planner is used to deeompose a task into subtasks that ean be solved by 
the available serviees. In the seeond stage, a genetie algorithm with heuristies is 
used to find the optimal eombination of serviee providers to provide the serv- 
iees identified. We show through various experiments that the genetie algorithm 
finds optimal solutions quieker than a modified depth- first seareh algorithm. 



1 Introduction 

One of the basic problems of open, multi-agent systems for the Internet is the connec- 
tion problem [9]. That is, each agent must be able to locate the other agents who may 
have capabilities which are necessary for the execution of tasks. The solution to this 
problem relies on using some ‘well-known’ agents and some basic interactions with 
them - matchmaking or brokering. 

We have developed a brokering system where there are client agents, service agents 
and one or more brokers. A client agent has a task which can be solved by a single 
service agent or a federation/combination of service agents. However, the client is not 
aware of what service agents are available or how a task can be, necessarily, decom- 
posed and solved by a set of service agents in the best possible way. The client agent 
requests the agent service broker to “recommend” a set of service agents who are 
capable of solving a particular task. The broker maintains a repository containing 
current and correct information about operational service agents. Each service agent 
represents a particular service provider. There can be one or more service agents (pro- 
viders) providing the same service. The broker identifies a combination of service 
agents who have the capabilities to solve a client’s task. The broker decomposes a task 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 119-128, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




120 Aneurin M. Easwaran and Jeremy Pitt 



into subtasks and one or more service agents have the capability to solve each subtask. 
The service agents who provide the same service may vary in various measures like 
cost, bandwidth, delay, etc. The service agents who provide different services may 
collaborate in order to enhance some measure. The broker selects winners to solve a 
task based on multiple criteria/measure and ensures the overall measure of the set of 
winners is optimal or near-optimal. The client agent is notified by the broker of the 
service agents that are required to solve the task. The client is then free to initiate a 
dialogue with the service agents for the appropriate services. This paper focuses on 
brokering a combination of services to solve a task. 

Combinatorial auctions are auctions where bidders can bid on combinations of items. 
Combinatorial auctions are applicable to many real world situations. In an auction for 
the right to use railroad segments a bidder desires a bundle of segments that connect 
two particular points; at the same time, there may be alternate paths between these 
points and the bidder needs only one [1]. A set of services can be combined to im- 
prove the cost or quality of service of the combination. The services can be combined 
in many ways. For example, there can be a deal where one can buy a service and gets 
another service for free and, therefore, the combination of services would have a cost 
equal to the cost of one service. Obviously it is cheaper to buy the combination than 
the individual services. The broker considers the various combinations in order to find 
the optimal solution (i.e. a set of service providers) based on multiple criteria. In com- 
binatorial auctions bidders may place bids on combination of items whereas in our 
system service providers quote a particular value for a combination of services. 

While economics and game theory provide many insights into the potential use of such 
auctions, they have little to say about computational considerations. In this paper we 
address the computational complexity of combinatorial auctions. 



2 Winner Determination 

In essence, the winner determination problem is to find an optimal set of service 
agents to solve a client’s task - optimal in terms of criteria specified. The problem has 
two parts - satisfiability and optimization. 

Satisfiability: Given a set of services and a task, establish whether the task is satis- 
fiable by the current set of services available. Basically identifying the services re- 
quired to solve a task. This involves breaking a task into subtasks until there are only 
primitive tasks that can be solved by a combination of service agents. 

Optimization: Identify a set of winners or service providers based on multiple crite- 
ria, to provide the services identified to solve the client’s task. This paper mainly 
focuses on the optimization stage of winner determination. 

The broker must consider the various relationships that exist between services 
(where the respective service providers can be the same or different) in order to iden- 
tify a set of service agents to solve a task at the optimum measure(s). The service 
providers identified by the broker to solve a task are the winners. The relationship 
between services is one of three types: 




A Brokering Algorithm for Cost & QoS-Based Winner Determination 121 



• Cooperation: Different services can work together regardless of whom provides 
each service. 

• Benefit cooperation: Different services can work together regardless of whom 
provides each service but there is an added benefit for using particular service pro- 
viders for the required services. Service combinations or deals are the result of this 
relationship. 

• No cooperation: Different services cannot work together regardless of whom pro- 
vides each service. 

The order in which services are combined matters in all three relationships. For ex- 
ample, it may not be possible to combine (i.e. no cooperation relationship) serviceA 
with serviceB but combining serviceB with serviceA may have an added benefit (ei- 
ther a cooperation or a benefit cooperation relationship). Ordering of services is par- 
ticularly important in the telecommunications domain. These relationships are quanti- 
fied according to the criteria used for optimizing. For example, if winner determina- 
tion is based on cost of services then a no cooperation relationship can be represented 
by a very large number (or oo). The relationship between services grows exponentially 
as the number of services and service providers increase. 

Proposition 2.1: Winner determination is NP-complete. 

Proof: The Travelling Salesman model assumes that the decision-maker has deter- 
mined a priori which cities will be sequenced; i.e. which cities will be visited. Con- 
sider a generalization of the TSP, which combines the decisions of city selection and 
city sequencing. Instead of pre-selecting the cities to be visited, the generalized model 
assumes the cities have been grouped into mutually exclusive and exhaustive states; 
i.e. cities are grouped into states. The GTSP is then to find a minimum cost path which 
includes exactly one city from each state. The generalized travelling salesmen prob- 
lem [10](GTSP) is transformed to an agent service brokering problem. Each state is 
transformed to a service and each city in a state is transformed to a service provider 
providing the service. The cost of travelling between two states is transformed to cost 
of purchasing two services. The generalized travelling salesmen problem is NP- 
complete. A detail proof can be found in [3]. ^ 

The search space for winner determination is greater than the GTSP when we take into 
account the number of possible ways in which a set of services can be combined in a 
route. In the GTSP, adding a new node to a route increases the total cost of the route 
by the amount required to travel from the last node (^-1) to the new node {n). Whereas 
in the agent service brokering problem, adding a new node does not necessarily in- 
crease the total cost of the route by the amount required to purchase the new node 
because the cost of adding a new node is dependent on the history of the route. For 
example, say, there is a route with four services (S1S2S3S4) and the total cost of the 
route is the sum of two deals (i.e. benefit cooperation relationship - S1S2 and S3S4). 
Now adding a new service (S5) to the route may bring into effect two different deals 
(S1S2S3 and S4S5) which may be cheaper than buying S5 and the previous two deals - S1S2 
and S3S4. 

Proposition 2.2: The number of possible solutions to a problem is 

{n\ * {mi * ... * * 2^"'^^) where n is the number of services and is number of 

service providers providing service x and there are 2^"'^^ possible service combina- 
tions. The number of possible solutions to a GTSP is (^! * {m\ * ... * m„)) where n is 
the number of states and is number of cities in state x. 




122 Aneurin M. Easwaran and Jeremy Pitt 



Proof: We outline a proof to show how 2^"'^^ is derived. Two services can either be 
combined (AND operator) or not combined (OR operator). If there are n services then 
there are (^-1) places where either of the operators could appear. Since there are 2 
operators and n-\ places where the operators can appear then there are 2^"'^^ ways in 
which n services can be combined. ■ 



3 Architecture Overview 

The broker consists of two components to solve each stage of the winner determina- 
tion problem - a planner and an optimizer. The planner receives a task from a client 
which is decomposed into subtasks using predefined plan methods. The planner also 
identifies the services that are required to solve each subtask. The planner then notifies 
the optimizer of the services that are required to solve each subtask. 

The planning process developed is based on a hierarchical task network (HTN) 
planning formalism. The planner searches through plan space to solve the planning 
stage of the problem. Conventional wisdom in the planning community, supported to 
large extent by the fielded applications to-date, holds that most real world domains are 
best modeled with hierarchical task network planning models [5]. The planner starts 
with a task or goal, and on each iteration adds one more step i.e. decomposes a task 
into sub-tasks until there are only primitive tasks. It does this by choosing some op- 
erator - either from existing steps of the plan or from the pool of operators - that 
achieves a complex task. If this leads to an inconsistent plan, it backtracks and tries 
another branch of the search space. To keep the search focussed, the planner only 
considers adding steps that serve to achieve a complex task that has not yet been 
achieved. The operators are mainly various task decomposition methods which cap- 
ture human expertise. We do not present any detail information on the planner due to 
space constraints. 

The optimizer identifies the best combination of service providers to provide the 
services identified. In order to achieve this it considers the various service combina- 
tions (deals) with same or different service providers. The optimizer is a genetic algo- 
rithm (GA) with heuristics that yields an anytime algorithm. 

There is an obvious need to describe agent services, tasks, plan methods and deals 
in a common language before any service advertisement, request or even brokering 
between agents can take place. In building the agent service broker, we need two types 
of knowledge which must be described in some common language: 

1. Domain factual knowledge: Knowledge about the objective realities in the domain 
of interest (services, relations, etc.) 

2. Task-solving (plan methods) knowledge: Knowledge about how to achieve various 
tasks. This knowledge would be in the form of a task-solving method specifying how a 
class of tasks can be accomplished. 

The two types of knowledge are represented using frames in our system. Frame 
representation systems are one of the primary technology used for large scale knowl- 
edge representation in AI[7]. The representation scheme used was designed to be 
simple and general. Simplicity is essential in order for the broker agent to run effi- 
ciently. It is also essential in order to support knowledge acquisition and authoring. 




A Brokering Algorithm for Cost & QoS-Based Winner Determination 123 



4 Genetic Algorithm and Heuristics 



The second stage of the brokering process involves selecting a combination of service 
providers to provide the identified services. This stage is solved to optimality or near 
optimality, depending on the size of the problem and the time available to the user, by 
a genetic algorithm (GA). The GA is a highly parallel mathematical algorithm that 
transforms a population of individual objects, each with an associated value of fitness, 
into a new generation of the population, using the Darwinian principle of survival and 
reproduction of the fittest and analogs of naturally occurring genetic operations such 
as crossover and mutation. GAs have been successfully applied to a problem called 
the knife change minimization problem[2], which is an instance of the GTSP. 

Problem representation is critical to the success of GA. Each possible point in the 
search space of the problem must be encoded as a character string (i.e., as a chromo- 
some). There are number of ways to represent a problem. The representation used by 
the GA is diagrammed in Fig. 1. We represent each individual in a population using 
three chromosomes: 

1. Service chromosome - Represents the services that are required to solve a task. 
Service combinations are based on the order in which the services appear in the chro- 
mosome. 

2. Provider chromosome - Represents the service provider providing the service 
shown by the service chromosome. For e.g., service S 2 is provided by provider Pa- 

3. Deal chromosome - Represents the services that can be combined to benefit in 
some way, e.g. cost. A service combination is represented using 1 or 0 and a non- 
combination is represented using *. For example, services S 2 and S 4 are combined 
whereas service S 5 is not combined with any other service. 

Individual 

{ 

Service: 



Provider: 



Deal: 



Fitness: 



} 



S2 


S4 


S5 


Si 


S3 




Pa 


Pd 


Pb 


Pc 


Pb 




1 


1 


* 


0 


0 



10 



Fig. 1. Problem Representation 



Each individual has a fitness value associated to it. The fitness of an individual is 
the cost of purchasing the services but it can be the sum of one or more measures that 
are to be optimized. Two problems must be addressed in order to sum the values of 
the various measures: 

Order of magnitude problem - The values of the measures to be summed are of dif- 
ferent orders of magnitude and they are to be normalized. One way of normalizing is 
by converting a value to a percentage based on the largest value in its category. 




124 Aneurin M. Easwaran and Jeremy Pitt 



Minimization/maximization problem - The various measures are either to be mini- 
mized or maximized. For example, cost is minimized and bandwidth is maximized. 
The measures are to be converted so that all the measures are either minimized or 
maximized. Negating the appropriate values can solve this problem. If all the values 
are to be minimized then the values that are required to be maximized are multiplied 
by negative 1 in order to minimize them. 

The process of normalization and appropriate negation of measures in conducted 
before the GA search begins. Thus, the efficiency of the fitness function is not hin- 
dered. 

The outline of the genetic algorithm is given in Fig. 2. The GA creates the initial 
population using the information about the services required, their providers and the 
existing deals/combinations. Heuristics are applied on the initial population to im- 
prove the quality of the population. Individuals are selected from the old population to 
create a new population by applying the genetic operators on the selected individuals. 
The search process continues for a fixed number of generations or until there is no 
improvement in the quality of the best individual. Repair heuristics are applied on the 
best individual in an attempt to improve the quality of the solution further. The GA 
stores the best individual from the start of the search process and individual is re- 
placed when a better one is found as the search progresses. The search process can be 
terminated at anytime and the current best individual would be the solution. Thus, the 
GA yields an anytime algorithm. Inevitably, optimizing on more than one criteria 
involves a trade-off between profit optimization and end-user satisfaction. The best 
solution found by the GA may not suit the client. To overcome this problem, the client 
can specify the number of solutions to be produced by the GA and then select the best. 
The time required by the GA to find 1 or ^ solutions is the same unlike other search 
methods. 



1 . Get data on - Services, Providers and Deals. 

2. Create initial population. 

3 . Apply initial population heuristics. 

4. Select individuals: Random or fitness based. 

5. On selected individuals apply genetic operators (mutation and crossover). 

6. Add new individuals to new population. Calculate fitness of each individual. 

7. Next Generation - Go to step 4. 

8. Apply repair heuristics to final solution. 



Fig. 2. Genetic Algorithm Outline 

Genetic operators are applied on the individuals to improve the fitness of the indi- 
viduals from one generation to another. Simple crossover and mutation operators 
produce illegal chromosomes when applied. For example, the one-point crossover 
which replaces a certain proportion of a chromosome with an equal proportion of 
another chromosome may produce chromosomes where the services are repeated or 
the service provider for a service is wrong. To avoid such problems special crossover 
and mutation operators are to be applied on the chromosomes of the individuals se- 





A Brokering Algorithm for Cost & QoS-Based Winner Determination 125 



lected. The selection of individuals in the GA are either random or fitness based (tour- 
nament). Mutation operators were developed to introduce new service ordering, new 
providers or new deals. The crossover operator in the GA is responsible for transfer- 
ring a deal from one parent to another to create two new children. The crossover op- 
erator ensures that the deal that is transferred from one parent to another will have 
minimum impact on the existing deals in the other parent. 

Two types of heuristics were developed to improve the performance and the quality 
of the solution produced by the GA. The initial population heuristic was developed to 
capitalize heavily on the sparseness of deals. In practice the space of deals is neces- 
sarily extremely sparsely populated. For example, if there are 100 services, there 
are combinations, and it would take a very, very long time to create all those 

combinations. Therefore, randomly generating deals when creating the initial popula- 
tion is futile as most randomly generated deals would not match the existing valid 
deals (in the repository). 

1. Initial population heuristic - The heuristic randomly selects valid 
deals/combinations from the repository and uses them to create new individuals in- 
stead of randomly creating combinations. 

2. Repair heuristic - This heuristic ensures all services that are not part of a 
deal/combination have the best (cheapest) service provider. 



5 Brokering Scenario and Related Work 

The brokering algorithm is applied to a simulated real-world problem in the telecom- 
munications domain. The dynamic Virtual Private Network (VPN) service is a tele- 
communication service provided to users who want to set up a multimedia connection 
with several other users. The brokering algorithm is currently used to simulate dy- 
namic VPN service provisioning based on cost and quality of service like bandwidth 
and delay. Every link in a VPN is characterised by a vector expressing the quality of 
service properties of the link: bandwidth (Mbps), delay (qs), etc. and the cost of the 
link. Each link is considered as a service and they may be combined for the benefit of 
the service providers and/or users. The various link combinations can result in many 
benefits like lower cost and a higher bandwidth, a higher bandwidth and a lower delay, 
or just lower delay, to name a few. The task for the broker is to construct a network 
connection (a virtual link) between n users at the cheapest cost and good quality of 
service. The broker identifies a set of service agents, who represent various parts of 
the network, that are required to provide the network connection at the cheapest cost 
with good quality of service. 

Several commercial and academic auction houses have recently appeared on the 
Internet, but to our knowledge, this implementation is the first of its kind - where 
there are multiple service providers, possible repetition of services in a deal, multiple 
optimization criteria and the bids are not superadditive: cost(SiuS 2 ) > cost(Si) + 
cost(S 2 ). Sandholm[8] and Fujishima et al.[4] have addressed the conventional combi- 
natorial auction problem but their approach to solving the problem is very different to 
our approach. Sandholm presented a Bidtree algorithm that performs a secondary 
depth-first search to identify non-conflicting bids. The Bidtree then uses an IDA* 




126 Aneurin M. Easwaran and Jeremy Pitt 



search strategy to solve the problem. Fujishima et al. have presented two methods for 
winner determination. The first method is a modified depth-first search applied on a 
structured search space to reduce runtime. Caching and pruning are also used to speed 
searching. The second method is a heuristic, market-based approach. It sets up a vir- 
tual multi-round auction in which a virtual agent represents each original bid bundle 
and places bids, according to a fixed strategy, for each good in the bundle. 



6 Experimental Setup and Results 

We conducted various empirical tests to evaluate the general performance of the GA 
by varying a) the number of services and b) the number of deals. The performance of 
the GA is compared to a modified depth-first search algorithm. The depth-first search 
algorithm examines all feasible deal and/or service combinations to find the optimal 
solution. We executed our programs written in Java on a PC (Pentium 200MHz 
with 64 RAM) to get the results. All the results reported are averages over 25 different 
runs. In the case of the GA, each run was terminated when the optimal solution was 
found. The run times include the planning stage of the brokering algorithm and the 
optimization is based on a single criteria i.e. minimization of cost of services. In the 
absence of real data we tested our algorithms against deals randomly generated using 
the service data in the broker’s repository. Each deal was created by randomly select- 
ing services from l..m services. Each service has 1 to 5 providers and an appropriate 
provider was also randomly selected to provide the selected service. The price of deals 
for n services is randomly distributed between [c(l-<i), c{\+d)\ where c= 
sum(pricei..pricen) and <i=0.1. We do not present any experiments varying the number 
of service providers, optimization criteria and deal length distributions (exponential or 
random) in this paper due to space constraints. 

The GA without any heuristics finds optimal solutions for small (10 services) 
problems. As the problem size increased the quality of solution decreased. The algo- 
rithm was tested on a large problem (100 services) and quality of solution was poor. 
The poor quality of solution was because the GA generates various random service 
combinations (deals) to find a valid matching deal that will help in finding the optimal 
solution. In practice, the number of valid deals is very small compared to the theoreti- 
cal value. Most of the randomly generated deals were, therefore, invalid. Thus, the 
reason for the poor quality of solution is because the deals are sparsely populated in 
practice. A heuristic was developed to enable the GA to select valid deals (from the 
repository) when creating the initial population. It is vital that the GA starts with a 
good initial population as part of the search is already completed. The valid deals are 
combined by the crossover operator as the generations progress to find the optimal 
individual. The mutation operator was modified to select valid deals instead of creat- 
ing deals randomly. The heuristics (i.e. initial population and repair) and the genetic 
operators enabled the GA to find optimal solutions for large size problems quickly. 

To answer questions a) and b) we measured the run time of algorithm by varying 
the number of services (the number of deals is kept proportional to the number of 
services) and the number of deals (number of services is kept constant) and the results 
are shown in Fig. 3. 




A Brokering Algorithm for Cost & QoS-Based Winner Determination 127 





^Genetic Alg. ---Depth-First ^Genetic Alg. -.-Depth-Fi^t 



Fig. 3. Run Time Comparison 

The GA demonstrates an excellent performance both in finding optimal solutions and 
as an anytime algorithm in comparison to the modified depth-first search algorithm. 
The depth-first search takes a much longer run time than the GA as the number of 
services/deals is increased. As expected from the theoretical results, the difference in 
run time of the algorithms is considerable when the number of services/deals is larger. 
The performance of the GA is marginally better than that of the depth-first search 
when the number of services is small. Both the GA curves in Fig. 3 grow sub-linearly 
on the logarithmic graph, suggesting polynomial-time performance. 



7 Conclusion 

We presented a new brokering algorithm for optimal winner determination in combi- 
natorial auctions. Determining the winners so as to minimize a set of measures (cost, 
bandwidth, etc.) is NP-complete. The brokering algorithm described is a new two- 
stage (planning and optimizing) algorithm for optimal anytime winner determination. 
The results from various experiments were provided. The experimental results show 
the GA finds optimal solutions very quickly compared to a modified depth-first search 
algorithm. The winner determination is computationally feasible for enlarged pocket 
of input sizes because the second stage of the algorithm capitalizes on the fact that the 
space of service combinations is necessarily sparsely populated in practice. The GA 
renders an anytime algorithm as it keeps track of the best solution found from the start 
of the search process. Task decomposition and coalition formation is an important 
aspect of service federation, and our system shows that it can be practical for agent- 
oriented middleware to support this new brokerage functionality. The techniques de- 
veloped to solve the brokering problem can be applied to similar NP-complete prob- 
lems like the constraint satisfaction problem. 

In the future, the brokering algorithm will be incorporated into FIPA-OS[6]. FIPA- 
OS is an open source implementation of the mandatory elements contained within the 
FIPA 97 specification for agent interoperability. FIPA-OS is an experimental agent 
framework, originating from research at Nortel Networks' Harlow Laboratories in the 
UK. The brokering algorithm will be used to provide dynamic VPN (virtual private 



128 Aneurin M. Easwaran and Jeremy Pitt 



networks) service provisioning based on cost and quality of service. We will also be 
introducing a replanning component to the broker architecture, which will enable the 
broker to find an alternative solution to a task if one the identified service agents fail 
or is not trustworthy. 



Acknowledgements 

We acknowledge support for this work (CASBAh project) from EPSRC, under grant 
GR/L34440. This project is being undertaken in collaboration with Nortel Networks 
and their support is gratefully appreciated. 



References 

1. Brewer, P. J., Plott, C.R.: A Binary Conflict Ascending Price (BICAP) Mecha- 
nism for the Decentralized Allocation of the Right to use Railroad Tracks. Int. J. 
of Industrial Organization, 14:857-886, (1996) 

2. Easwaran, A.M., Drossopoulou, S.: A Parallel Genetic Algorithm Approach To 
The Knife Change Minimisation Problem. In the Proceedings of the sixth Paral- 
lel Computing Workshop (PCW'96), Japan, (1996) 

3. Easwaran, A.M., Pitt, J., Poslad, S.: The Agent Service Brokering Problem As A 
Generalised Travelling Salesman Problem. Autonomous Agents, Seattle, WA 
USA, (1999) 

4. Fujishima, Y., K. Leyton-Brown, K., Shoham, Y.: Taming the Computational 
Complexity of Combinatorial Auctions: Optimal and Approximate Approaches. 
International Joint Conference on Artificial Intelligence, Sweden, (1999) 

5. Kutluhan, E.: Hierarchical Task Network Planning: Formalization, Analysis & 
Implementation. PhD Thesis, Dept, of Computer Science, University Of Mary- 
land, College Park, (1995) 

6. FIPA-OS. http://www.nortelnetworks.com/fipa-os 

7. Lenat, D.B., Guha, R.V.: Building Large Knowledge-based Systems. Represen- 
tation and inference in the Cyc project. Reading, Massachusetts, Addison- 
Wesley, (1990) 

8. Sandholm, T.: An Algorithm for Optimal Winner Determination in Combinato- 
rial Auctions. International Joint Conference on Artificial Intelligence (IJCAI), 
542-547, Sweden, (1999) 

9. Smith, R. G., Davis, R.: Negotiation as a metaphor for distributed problem 
solving. Artificial Intelligence. (1983) 20:63-109 

10. Srivastava, S. S., Kumar, S., Garg, R. C., Sen, P.: Generalised Travelling Sales- 
man Problem Through n Sets of Nodes. CORS Journal. (1969) 97-101 




An Overview of a Synergetic Combination of Local 
Search with Evolutionary Learning to Solve Optimization 

Problems 



Rasiah Loganantharaj and Bushrod Thomas 



Center for Advanced Computer Studies 
University of Louisiana, Lafayette, LA 70504 
{logan, bbt }@cacs . louisiana . edu 



Abstract. We describe a method for solving combinatorial optimization 
problem that combines best aspects of local search and genetic algorithms. We 
formulate combinatorial optimization problems as state space search problems. 
While local search methods, such as hill climbing, are computationally effi- 
cient, they suffers from local minima traps. Global search methods are guaran- 
teed to find optimal solutions, but are not always feasible. We favor a polyno- 
mial time technique that delivers solutions closer to optimal by modifying the 
search space of the local search method. We demonstrate our strategy on a sin- 
gle-machine scheduling problem with two objective functions: (1) minimizing 
average job completion time, and (2) minimizing total tardiness. We apply the 
technique to optimally schedule the robot arm of an automated retrieval system. 
Obtaining optimal solutions to such scheduling problems is computationally in- 
tractable, but experimental results show our technique produces better solutions 
than those found by genetic algorithm with random key encoding. 



1 Introduction 

Many interesting problems in artificial intelligence are NP-complete, and thus have 
no polynomial solutions. Using either a brute force or a systematic algorithm for 
solving such problems is not feasible because these algorithms repeatedly traverse 
through an exponentially growing search space. When sub-optimal algorithms are 
used to solve such computationally intractable problems, the solution quality and the 
time to arrive at such solutions are both important. 

Primarily two techniques, namely local search and randomized methods, are used 
to obtain sub-optimal solutions to the intractable problems. Local search is guided by 
heuristics, and has the advantage of maintaining only a few nodes that are in the order 
d where d is the depth of the tree. The time complexity of a local search method such 
as hill climbing is also in the order of d. While local search methods use minimal time 
and space, they suffer from local minimum traps. To alleviate these traps, simulated 
annealing [2] is typically used with local search methods. The effectiveness of simu- 
lated annealing is dependent on the landscape of the solution space and on how the 
virtual temperature of the annealing process is controlled. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 129-138, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




130 



Rasiah Loganantharaj and Bushrod Thomas 



The genetic algorithm (GA) is a randomized technique often used to solve optimi- 
zation problem [8]. This algorithm starts with an initial population consisting of a set 
of chromosomes. Each chromosome of the population corresponds to a feasible solu- 
tion of a problem that we are trying to solve. A chromosome consists of a sequence of 
genes. New solutions are created from the population by performing operations such 
as crossover, linear interpolation or extrapolation, and mutations on some selected 
chromosomes. As the population grows with new offspring, stronger chromosomes 
are kept and weaker ones are removed. 

While the genetic algorithm has been successfully used to solve many optimization 
problems [8], it also suffers from the local minimum trap. Mutation is usually used to 
alleviate this problem. 

To solve combinatorial optimization problems, we embrace a new approach of 
modifying the solution space of a local search method such that the search method 
converges at, or closer to, the optimal solution to the original problem [1], [6]. 

This paper is organized as follows. The introduction is followed by a brief over- 
view of the approach. Section 3 addresses the details of discovering evaluation func- 
tion. In section 4, we apply the technique to solve easy and hard problems. The paper 
is completed with a summary and discussion in section 5. 



2 Overview of the Approach 

Any combinatorial optimization problem can be modeled as a search problem that 
starts with an initial state and progresses towards a feasible solution in a manner that 
minimizes or maximizes the objective function as required. While a global search 
method can guarantee to find the optimal solution when applied with an appropriate 
heuristic function, the search-space and time to find the optimal solution grow expo- 
nentially to the extent that such methods are not practical for many real world prob- 
lems. On the other hand, a local search method such as hill climbing uses space and 
time linear to the depth of the search tree. The major disadvantage of a local search 
method is that it tends to trap at the local minimum point, and hence the quality of the 
solution is not guaranteed to be the best. However, if we can change the landscape of 
the solution space of the local search method such that a local search method leads to 
the optimal or near optimal solution, we achieve the best of both the local and the 
global search methods. That is, we get the solution quality of a global search method 
while spending time proportional to that of a local search method. 

How do we change the landscape of the solution space of a local search method? 
Changing the evaluation function used by the local search method can change the 
landscape of a solution space. The challenging aspect of such alteration of the solu- 
tion space is that it must facilitate the convergence of the local search closer to, or at 
the best solution of the original problem. 

An alternative to handcrafting a local heuristic evaluation function is to discover 
such function by applying machine learning techniques, which have been successfully 
applied to solve combinatorial optimization problems. Loganantharaj had some pre- 
liminary results in applying evolutionary learning to improve the performance of a 




Synergetic Combination of Local Search with Evolutionary Learning 



131 



ground processing scheduling systems (GPSS) [6] by discovering the local heuristic 
function. Justin Boy an [1] applied reinforcement learning techniques to learn local 
heuristic evaluation functions to solve some combinatorial optimization problems. 



3 Discovering Evaluation Functions 

We assume that the problem is formulated such that a local search converges to a 
feasible solution, and that the quality of the solution is dependent on the evaluation 
function. Suppose Pj, P 2 , ... p^ are relevant parameters of the problem. A local heuris- 
tic evaluation function, say H, may be approximated as the weighted summation of 
the normalized value of the parameters: 

H = 2w,*p^. (1) 

In the formula, and p^ are respectively the weight and normalized value of pa- 
rameter k. Also, assume that w^ takes any value from -1 to +1 through 0 for all k = 1 
through r. Initially, each weight is assigned some random value within the range from 
-I to +1, and the local search uses the evaluation function H as given in equation I to 
converge. After the local search converges, the quality of the solution is evaluated by 
using the objective function of the problem. 

Discovering the evaluation function for the local search becomes the problem of 
discovering the set of weights that lead the local search to a solution that minimizes 
(maximizes) the objective function of the problem. The problem of discovering such 
weights can be solved by genetic algorithm. In our formulation, each weight vector 
forms a chromosome, thus randomly generated weight vectors form the chromosome 
pool. Using each weight vector chromosome, the local search finds the feasible solu- 
tion and computes the objective value for the solution. The objective value of the 
solution becomes the fitness value of the chromosome. A genetic algorithm goes 
through the convergence process by applying crossover, linear interpolation or ex- 
trapolation, and mutation. When a genetic algorithm converges, it yields the weight 
vector that corresponds to minimizing (maximizing) the fitness function, which is 
same as the objective function of the original problem. 

Once the weight is trained, it is used with the local search method to minimize 
(maximize) the objective function of the problem. If all the problem instances con- 
verge to the same weight vector during the training period, it is very likely that the 
local search method with the trained weight vector will minimize (maximize) the 
objective function of any instance of the problem. 



4 Applications 

To study our technique, we consider two problems. First is the conceptually simple 
problem of scheduling a single machine. Second, we consider the practical problem 
of retrieving thousands of small parts of different types and sizes on demand. 




132 



Rasiah Loganantharaj and Bushrod Thomas 



4.1 Single Machine Scheduling Problem 

In a typical single machine- scheduling problem, jobs arrive at different time. Each job 
has a different duration, and there will be non-zero preparation time to make the ap- 
propriate changes in machine configuration to make it ready for the incoming job. 



Table 1. Notation for the rest of this paper 



Description 


Notation 


Duration of job k 


Dur (k) 


Completion time of job k 


Comp (k) 


Scheduled time / start time of job k 


Sch (k) 


Due date of job k 


Due (k) 



Using notation from Table 1, average completion time for N jobs can be defined: 

Average completion time = Sj (Sch (j) + Dur(j)) / N . (2) 

Let us consider a simplified version of the problem first: All the jobs arrive at the 
same time, and there is no preparation time. Optimally achieving this objective func- 
tion is polynomial- schedule all jobs in the ascending order of their duration. We 
show that our method discovers this strategy of favoring the shortest job first. 

The relevant parameter of this problem is the duration of a task. Supposing “dlong” 
is the longest task duration, we use d/dlong as the normalized value of task with dura- 
tion d. We randomly generated a training set with arbitrary durations. 



Table 2. A portion of the training set 



Jobs 


Duration 


Normalized Value (dlong=30) 


1 


20 


0.666 


2 


15 


0.500 


3 


25 


0.833 



We applied the techniques to learn the strategy to solve the problem. The chromo- 
some has only one gene and its value is generated randomly between -1 through 1. 
The algorithm converged with the best chromosome of -1 indicating that favoring the 
shortest job is the best strategy in minimizing the total summation of the completion 
time. This experiment is repeated for jobs ranging from 10 to 100. We consistently 
obtained the same result, confirming that the shortest job first policy achieves the 
minimal average waiting time, as we expected. 

Not all single machine scheduling problems can be solved polynomially. Consider 
an objective function of minimizing the total tardiness. A job is said to be tardy if it is 
completed after its due date. The total tardiness is given by Formula (3), where 
max(a,b) returns the maximum of a and b: 

Tardiness = max(0, (Sch (j) + Dur(j) - Due(j))) . 



(3) 




Synergetic Combination of Local Search with Evolutionary Learning 



133 



If a job completes before its due date, we assume there is no reward and that the pen- 
alty is zero. It has been shown [3] that even this simplified problem is NP-hard. 

The relevant parameters for this problem are the duration and the due date of a job. 
The due date and the duration are normalized. A snapshot of the data used for training 
is given in Table 3. The chromosome has two genes; one corresponds to the weight 
for the duration and the other one corresponds to the weight for the due date. As in 
the previous experimentation, chromosomes are generated randomly and each gene 
takes any value from -I to I, including the boundary values. 



Table 3. Snapshot of training data 



Job 


Duration 


Due Date 


1 


10 


33 


2 


12 


41 


3 


13 


59 


4 


15 


103 


5 


11 


95 


6 


12 


32 


7 


12 


15 


8 


11 


19 


9 


12 


33 


10 


18 


40 



After convergence, we got weight vector [-0.93, -0.93] for the training set. We ran 
the learning algorithm on randomly generated data sets for jobs ranging from 10 to 
90. The weight vector changed with the data set, but it was close to vector [-1,-1]. 



Table 4. Results for single machine scheduling problem 



Number of Jobs 


Local Heuristic 


Best of GA 


% Improvements 


10 


200 


205 


2.5 


10 


116 


96 


-17.2 


10 


172 


93 


-45.9 


50 


14,038 


13,271 


-5.5 


50 


12,306 


11,992 


-2.5 


50 


15,521 


14,715 


-5.2 


90 


35,702 


36,111 


1.1 


90 


26,700 


30,688 


14.9 


90 


20,564 


30,564 


48.6 



We used weight vector [-1, -1] to schedule jobs to minimize total tardiness. The 
priority for each job was computed as the summation of the two parameters: negation 
of the normalized duration, and negation of the normalized due date. The schedule is 
obtained by arranging the jobs in descending order of their priority. The total tardi- 
ness of each schedule is computed. The qualities of the schedules (total tardiness) are 
compared with ones obtained by applying a randomized genetic algorithm technique. 



134 



Rasiah Loganantharaj and Bushrod Thomas 



For comparison with randomized scheduling algorithms, we tested random key en- 
coding [7] and feature-based encoding [4]. Features considered were job duration and 
the interval corresponding to the difference between the due date and the duration. 

Each problem was generated randomly using parameters such as minimum and 
maximum duration, and maximum due dates. Feasible due dates are those that are 
greater than the duration of jobs. The cost in Table 4 represents the total tardiness for 
each run using the given encoding technique. There is no relation between different 
rows; the result must be compared along each row. 



4.2 Automated Storage Retrieval System 

Assume that parts are sorted and organized based on their type; parts of a specific 
type are stored in a particular bin of a two dimensional arrangements of bins, which 
we call a tray. For minimizing the workspace, assume that the trays are formed as 
vertical carousels. Assume that a track mounted, linearly actuated robot is being used 
to retrieve a part from the given bin (location) and deposit the part onto a transporta- 
tion device (conveyor belt). When a product needs to be retrieved, the storage device 
moves so that the appropriate tray is presented to the robot through a service window. 
The robot travels along the length of the storage device and retrieves the product. The 
vertical carousel, in addition to increasing the number of different types of parts ac- 
cessible by the robot on a given tray, helps to maximize the utilization of three- 
dimensional workspace. This is an important factor, considering spiraling real estate 
costs. Vertical carousels also tend to have higher throughput rates. 




Figure 1. Illustration of robot arm motion 

In order to develop a solution that maximizes the throughput, it is necessary to 
model the behavior of the robot arm. Assume that the arm has to pick up objects at 
two locations, {xl, yl} and {x2, y2}, in the order given. After picking the object at 
{xl, yl}, the robot arm moves parallel to the Y-axis to yO (out of the storage device 
and onto the conveyor belt) in order to drop the object. Then it moves to the next 
location {x2, y2j. To simplify the model, let us assume that the robot first moves 
parallel to the X-axis (along the length of the storage device) to location x2, and then 
to the storage device to location y2. With this simplification, retrieving an object 
from a bin can be modeled as moving laterally to the appropriate x location, then to 
the y location, and then back out of the storage device. The time to pick up and drop 
the object onto the conveyor belt can be assumed to take some constant time (ap- 
proximated at 2 second). Therefore, picking up an object can be modeled as taking a 



Synergetic Combination of Local Search with Evolutionary Learning 



135 



constant time (2 sec.) plus the set-up time for the arm to move to the X-coordinate of 
the current object from the current position, assuming that the new item is also stored 
in the same tray as the previous one. There are 33 rows across the X-axis, and it takes 
about 0. 1 seconds to move from one row to the next. If the parts are not in the same 
tray, the set up time will also include the time involved in waiting for the target tray to 
reach the robot arm’s operating space. The time taken to bring a next tray to the oper- 
ating space of the robot is about 2.3 second. 

A location of an item in the storage bin, say (x, y, z), represents the row number, 
the column number and the tray number. Let us assume that rows are numbered from 
1 through 33, columns are numbered 1 through 10, and the trays (z) are numbered 1 
through 10. If Ax and Az represent robot arm displacement from current position to 
the position of the object, the following formula computes time to retrieve the part: 

Time to retrieve = Ax * Cx + Ax * Cz + Cf . ( 4 ) 

In the above formula, Cx and Cz are, respectively, the time required to move the 
robot arm per row, and to move the storage tray by one unit up or down to the robot’s 
working space. Cf is the constant time for fetching the object. For example, time 
required to retrieve an object at (3, 4, 5) just after fetching an object from (6, 2, 7) is 
equal to 3*Cx + 2*Cz + Cf. For this particular arrangement of the robot retrieval 
system, Cx = 0.1 seconds, Cz = 2.3 seconds, and Cf = 2 seconds. 



Table 5. Comparison of solution technique performance on storage retrieval problem 



Items 


GA with RK 


Approach 


Percentage 




Encoding 


of this Paper 


Improvement 




ATT 


Comp 


ATT 


Comp 


ATT 


Comp 


100 


212 


460 


128 


253 


39.62 


45.00 


100 


233 


507 


127 


257 


45.49 


49.31 


100 


301 


647 


126 


249 


58.14 


61.51 


200 


632 


1411 


231 


453 


63.45 


67.90 


200 


545 


1201 


227 


463 


58.35 


61.45 


200 


590 


1308 


222 


455 


62.37 


65.21 


300 


949 


2003 


323 


662 


65.96 


66.95 


300 


913 


2000 


327 


655 


64.18 


67.25 


300 


850 


1858 


327 


654 


61.53 


64.80 



The objective is to improve the performance of the retrieval system, which can be 
measured in terms of the average turnaround time of the components in addition to 
other quantities such as the total schedule length. Suppose there are N items and they 
are picked up in the order from 1 to N. Also assume that item k is retrieved and 
dropped at the conveyer belt at time tk. The average turnaround time is equal to 
sum(tl, t2, ...tN)/N and the schedule length is tN. The objective function for the 
scheduling problem is to minimize average turnaround time for the ordered objects. 

We implemented the computer model of the robot arm behavior as described in 
section 4. The robot’s initial location is set to (1, 0, 1), and we generate items ran- 




136 



Rasiah Loganantharaj and Bushrod Thomas 



domly in each cell of each tray distributed across 10 trays. When generating and plac- 
ing items, we apply some bias, say for each tray selected we distribute 4 items. We 
ran the experiment for items ranging from 100 to 300 (see Table 5). We also ran simi- 
lar experiments for items distributed completely randomly (see Table 6). 

We compared the results of our method to a well-known, effective technique for 
solving scheduling problems, a genetic algorithm with random key encoding. In Ta- 
bles 5 and 6, column title ATT stands for the average turnaround time retrieving an 
object. The next column title. Comp, stands for schedule length, or the total comple- 
tion time. The tables also show the percentage improvement of our approach over the 
results obtained via a genetic algorithm with random key encoding. The values in 
columns in 2 through 4 are shown in seconds. 



Table 6. Results with items stored in bins randomly 



Items 


GA with RK 


Approach 


Percentage 




Encoding 


of this Paper 


Improvement 




ATT 


Comp 


ATT 


Comp 


ATT 


Comp 


50 


106 


240 


72 


158 


32.08 


34.17 


50 


127 


282 


74 


154 


41.73 


45.39 


100 


232 


506 


130 


251 


43.97 


50.40 


100 


304 


655 


128 


262 


57.89 


60.00 


150 


451 


985 


174 


367 


61.42 


62.74 


150 


411 


901 


176 


357 


57.18 


60.38 


200 


660 


1387 


230 


454 


65.15 


67.27 


200 


751 


1558 


227 


452 


69.77 


70.99 


250 


869 


1817 


276 


551 


68.24 


69.68 


250 


726 


1587 


275 


552 


62.12 


65.22 


300 


1022 


2170 


327 


660 


68.00 


69.59 


300 


1066 


2180 


326 


651 


69.42 


70.14 



5 Summary and Discussion 

In this paper, we described a general method for solving combinatorial optimization 
problems that combines the best aspects of both the local and global search methods. 
That is, it has the computational time efficiency of a local search method with quality 
as good as that of a global search method. Typically, a local search method traverses 
through the solution space of the problem and becomes stuck at the local minimum 
(local maximum). The converged value of the solution is typically not close to the 
best solution and is not satisfactory for many practical problems. Simulated annealing 
is often used to alleviate the local minima trap, but the method has limitations and it is 
not guaranteed to converge closer to the best solution. The idea behind our method is 
to modify the solution space of the local search method such that it will converge at 
or near to the best solution of the problem. Through trial and error procedures, sev- 




Synergetic Combination of Local Search with Evolutionary Learning 



137 



eral researchers have cleverly come up with local evaluation functions to solve com- 
binatorial optimization problem with some encouraging results. 

In order to modify the solution space for the local search and as well as to have the 
desirable property of converging closer or at the optimal solution, we use an evolu- 
tionary learning algorithm to discover the evaluation function. We have demon- 
strated our strategy on a certain class of scheduling problems. We applied it to a sin- 
gle-machine scheduling problem with two different objective functions. The first 
instance sought to minimize average completion time of jobs. This problem has a 
polynomial solution; scheduling task in ascending order of their duration achieves the 
optimum solution. Our training program consistently generated the weight vector [-1] 
for all training data sets, indicating the priority for jobs with shorter duration. This is 
exactly the same result as obtained by solving the problem optimally. 

When the objective function is to minimize total tardiness, optimally solving the 
problem is no longer computationally tractable. We applied our technique to train and 
discover a strategy to solve the problem sub-optimally. We found that the weight 
vector varies with training data set. This did not surprise us, since the optimum solu- 
tion of the problem is computationally intractable. Even though the weight vector is 
dependent on the training set, it varies around vector [-1,-1]. We use this vector to 
schedule tasks heuristically. Using the specification of a problem with job duration 
and due dates, a schedule is created by arranging the jobs in descending order of pri- 
ority. The priority of each job is the summation of negated normalized duration and 
negated normalized due date. Sorting the jobs according to their priority computa- 
tionally dominates other aspects of normalizing the duration and the due dates. There- 
fore, the computational time complexity of scheduling jobs is same as that of a quick 
sort, which is 0(N log(N)), where, N is the number of jobs. From the results of Table 
3, the average performance of the local search algorithm using the discovered weight 
vector is as good as that of the best results of the randomized techniques. The heuris- 
tic strategy performs better in larger problems. 

Next, we applied the technique to optimally schedule the robot arm of an auto- 
mated retrieval system. Optimal solutions are computationally intractable. Simulated 
experimental results show our method’s solutions to be far superior compared to that 
obtained by a random key encoding genetic algorithm. For each test set, the algorithm 
learns a weight vector whose values varied within a range. From converged values of 
the weight set, we guessed weight vector [-0.1, -1]. We ran all tests with this vector 
and got the same results in 95% of trials. In the remaining 5% of tests, we got slightly 
higher values (less than 2%) for average turnaround time and total completion time. 
This is another domain where the technique is successfully applied. 

We have not yet addressed how to determine the learned outcome for the weight 
set when the converged value of weights (chromosome) changes with different prob- 
lem instances. This difficulty appears when solving computationally intractable prob- 
lems. To sketch a systematic method to solve the problem, suppose we run the learn- 
ing algorithm for several instances of the problem and collect all the weight vectors. 
If all the vectors coincide, it indicates there is a polynomial solution to the problem 
and the outcome of the training process is the weight vector. Otherwise, we can cate- 
gorize the converged weight vectors into three groups: (1) well-defined single cluster. 




138 



Rasiah Loganantharaj and Bushrod Thomas 



(2) multiple well-defined clusters, and (3) completely scattered points. These catego- 
ries are shown in Figures 2, 3 and 4 for a two-dimensional parameter problem. When 
the trained vectors fall into category one, very satisfactory results can be obtained by 
using the center of the cluster with the local search method. When the converged 
weight vectors fall into class 2, the local search method is run as many times as the 
number of clusters and the best is chosen as the solution. The last category is the 
hopeless one. We cannot learn any evaluation function from it, and so must use the 
weight convergence algorithm to solve each problem instance. 




Fig. 2 Single cluster Fig. 3 Multiple clusters Fig. 4. Scattered points 

The proposed strategy is not the bullet to solve the entire combinatorial optimiza- 
tion problem. The success of this technique is dependent on the first capturing all the 
relevant parameters, and then it must be true that a linear weighted combination of 
these parameters (normalized) will lead to the desired result. Though more work 
needs to be done, the strategy we present in this paper is very promising. 



References 

1. Boyan, J.: Learning Evaluation Functions for Global Optimization and Boolean Satisfiability. 
Proceedings of AAAI (1998) 

2. Davis, L.: Genetic Algorithms and Simulated Annealing, Research Notes in Artificial Intelli- 
gence. Morgan Kaufmann, New York (1987) 

3. Du, J. Leung, J.Y.: Minimizing Total Tardiness on One Machine is NP-Hard. Mathematics 
of Operations Research, Vol. 15 (1990) 483-495 

4. Grass J., Zilberstein, S.: Anytime Algorithm Development Tool. Sigart Bulletin, Vol. 7, No. 
2 (1996) 

5. Leon, V.J., Balakrishnan, R.: Strength and Adaptability of Problem-Space based Neighbor- 
hoods for Resource-constrained Scheduling. OR Spektrum, Vol 17. Springer- Verlag, Berlin 
Heidelberg New York (1995) 172-182 

6. Loganantharaj R., Thomas, B.: Improving the Efficiency of the Ground Processing Schedul- 
ing System. NASA/ASEE Summer Faculty Research Report (1997) 

7. Norman, B.A., Bean, J.C.: Random Keys Genetic Algorithm for Job Shop Scheduling. 
Technical Report 94-5, Dept, of Industrial and Operations Engineering, The University of 
Michigan (1994) 

8. Michalewicz, Z.: Genetic Algorithms -i- Data Structures = Evolution Programs, Third Re- 
vised and Extended Edition, Springer (1996) 



Maintenance of KBS’s by Domain Experts 

The Holy Grail in Practice 



Arne Bultman^, Joris Kuipers^, and Frank van Harmelen^ 



^ ASZ Research & Development 

Kronenburg A-toren, Postbus 8300, 1005 CA Amsterdam, The Netherlands 
{arne . bultman , j oris . kuipers }@asz . nl 
^ Dept, of AI, Faculty of Sciences, Vrije Universiteit Amsterdam 
Frank . van . Harmelen@cs .vu.nl 



Abstract. Enabling a domain expert to maintain his own knowledge 
in a Knowledge Based System has long been an ideal for the Know- 
ledge Engineering community. In this paper we report on our experience 
with trying to achieve this ideal in a practical setting, by building a 
maintenance tool for an existing KBS. After a brief survey of various ap- 
proaches to this problem described in literature, we select a domain- and 
task-specific modelling approach as the most promising and appropriate. 
Eirst, we construct a domain ontology and a task model for the KBS 
system to be maintained, as well as a task analysis of the maintenance 
tool itself. The maintenance tool is subsequently implemented using a 
two layer architecture which seperates domain and system concepts. Al- 
though no full-scale evaluation has been undertaken, we report on our 
initial experience with this approach and present our conclusions. 



1 Motivation 

In Software Engineering it is well-known that the majority of the costs for soft- 
ware projects are not encountered during design or construction phases, but 
rather during the maintenance phase ([9]). The costs of the maintenance phase 
have been quoted to be as high as 80% of the total costs over the entire lifetime 
of a software project. 

Although less data are available for Knowledge Based Systems, there is no 
reason to believe that this situation in Knowledge Engineering is any different. 
A response to this problem that has appeared over the years in the Knowledge 
Engineering literature is to ’’take the Knowledge Engineer out of the loop”. 
The Knowledge Engineering process involves (at least) three parties: the in- 
tended end-user of the system, the domain expert who provides the expertise 
that forms the basis of the system, and the Knowledge Engineer, who acquires 
the knowledge of the domain expert and uses it to construct the system. The 
hope of Knowledge Engineering has long been to remove the Knowledge Engineer 
from the maintenance loop, and provide tools that enable the domain expert to 
maintain the system and adjust it to changing knowledge requirements. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 139-149, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



140 Arne Bultman et al. 



In the traditional situation, with the Knowledge Engineer in the loop, every 
required update to the knowledge base must be communicated by the domain 
expert to the maintenance team of Knowledge Engineers, who implement the 
change. This change must then be validated by the domain expert, who may 
suggest further or other changes. Only at the end of this iterated process can 
the new version of the system be released to field- users. In the ideal situation, a 
domain expert uses appropriate tools to directly implement the required changes 
in the system without repeated, time-consuming and error-prone interaction with 
a Knowledge Engineer. 

This paper describes a particular field-deployed Knowledge Based System for 
which we have tried to live up to this ideal: given a realistic KBS, is it possible 
to provide tools that enable the domain expert to take on much of the system 
maintenance? 

As an indication of the possible gains to be had in our application domain, 
we mention that the KBS with which we are concerned is used in 28 different 
locations, while a team of only two Knowledge Engineers is available to deal 
with several dozens of requested updates per year. 

In Sect. 2 we describe the historical developments of the tools investigated to 
realize the ideal of “maintenance by domain experts” . This results in a conclusion 
on the most appropriate way to tackle the maintenance problem in our appli- 
cation domain. The application domain is briefly described in Sect. 3. Section 4 
describes the conceptual modelling effort that was needed to enable mainte- 
nance by domain experts. The actual maintenance tool based on this conceptual 
model is described in Sect. 5. Section 6 brieffy discusses the results, and Sect. 7 
concludes. 

2 Literature 

[10] identifies four separate activities which are supported by Knowledge Engi- 
neering tools: 

1. Model construction: constructing an abstract and generic model of the 
knowledge types required to perform a certain task. Examples of such tasks 
are classification, diagnosis, configuration, etc. 

2. Model instantiation: such an abstract and generic model must be instanti- 
ated with knowledge from a specific domain. Any tool supporting this phase 
must of course be informed about the generic model and how the various 
knowledge types are represented in this model. 

3. Model compilation: most often, the models constructed in the previous 
steps are not directly executable. In such a case, the instantiated model must 
be somehow “compiled” into an executable form. 

4. Updating the instantiated model: When the resulting system does not 
(or: does no longer) function as required, either because incorrect know- 
ledge has been entered, or because the domain knowledge has changed, the 
knowledge in the model must be updated. 



Maintenance of KBS’s by Domain Experts 141 



It shall be clear to the reader that the tool support that we report on in this 
paper is concerned with exactly this fourth phase of Van Heijst’s list. 

In the Knowledge Engineering literature of the past twenty years or so, a 
number of different approaches can be identified for how to provide tool support 
for this fourth phase. We will briefly discuss these below. 

Rule-Based Editors. The first Knowledge Engineering tools were simply edi- 
tors to update sets of production rules in a knowledge base. Such tools were often 
based on existing systems (EMYCIN [11], for example, was based on MYCIN [8]), 
and allowed the user to make updates to an already existing system. 

An abstract specification of either task or domain was not available. For 
example, all that EMYCIN “knew” about MYCIN was that it used backward- 
chaining rules and hierarchical relations between concepts. As a consequence, 
little or no support was given for updating the knowledge in such a model, and 
the user had to be familiar with the internal workings of the system. 

Task- and Method-Specific Architectures. Because support from such rule- 
based editors was insufficient, subsequent research was aimed at tools based on 
a specific task-model. Two important advantages derived from the fact that 
these tools were informed about the specific task-model that was underlying the 
system. First, because such task-models closely correspond with the notions that 
are used by human domain-experts, these domain experts could communicate 
much more easily with these tools. Second, such tools could provide much more 
support during the instantiation and updating of these models. An example of 
such a system is SALT [7], which implements the propose-and-revise method for 
the parametric design task. 

These tools of course required research into such task-specific models. An 
early example of this was the work by Clancey on hierarchical classification [3] . 
Much of the Knowledge Engineering research in the 80 ’s was dedicated to iden- 
tifying such generic tasks [2]. 

Integrated Environments. The task-specific architectures from the 80’s were 
mostly aimed at instantiating models (phase 2), and provided little support 
for the other phases mentioned above. More integrated Knowledge Engineer- 
ing environments have been constructed which provided integrated support for 
other phases. Two well-known examples of such an integrated environment are 
Protege II [4] and EXPECT [5]. 

These environments provide support for phase 1 to 3 (constructing, instanti- 
ating and compiling models). Typically, they only provide support for updating 
models (phase 4) when these models were initially constructed within the same 
environment. 

Our Approach. In our case-study we are dealing with an existing KBS, which 
has not been designed or engineered with any of the existing KBS support envi- 
ronments, which means that no ready-made tool support for phase 4 is available. 



142 



Arne Bultman et al. 



Nevertheless, it is clear from the literature that the only approach to supporting 
model adaptation by domain experts is on the basis of a conceptual model of the 
system that is close enough to the concepts that are familiar to domain experts. 

As a result, we have decided to proceed by first building a conceptual model 
of the existing application, and subsequently using this conceptual model as the 
basis for a maintenance tool. 

Before we discuss the conceptual model of our application (Sect. 4) and the 
maintenance tool built on top of this model (Sect. 5), we briefly describe our 
application domain in the next section. 



3 The ISB-system 

In this section we will give an overview of the ISB-system in its current state. 
We will start with a short history of the system, explaining why it was built and 
what it does. Furthermore, we will describe the deployment of the system and 
its users. 

ISB stands for “IndelingsSysteem Bedrijven” which means Company Clas- 
sification System. Development of ISB began as part of a graduation project 
in 1994 and was later expanded into a full system and came into use two years 
ago. Its task is to classify employers into one of fifty- five sectors. Classification of 
an employer is necessary to determine the height of various insurance contribu- 
tions for the Dutch social security system and is based on the primary activity 
of the employer. Because of a lack of consistency in the classifications various 
people made and a decreasing number of experts in this domain, the decision to 
build this system was made. 

Over the years, the size of the system has grown from a small prototype 
to a fully fledged application containing over 1500 rules organized in approxi- 
mately 250 different modules. Initially the system ran on the VAX platform and 
was build using AionDS 6.4[1]. Nowadays it is developed using AionDS 7 and 
runs on the Windows platform using a GUI. 

ISB is used on a daily basis in 28 offices of the Gak company, a Dutch 
social security administrator. The classification process is, in theory, completely 
covered by legislation. In practice, however, the law leaves a lot of room for 
interpretation, especially when ‘new’ activities, such as the publishing of CD- 
Roms, are concerned. Moreover, ISB is not (yet) fully complete and correct; users 
of the system often report bugs and shortcomings of the system. As a result, a 
lot of maintenance is performed on the system. 

Since the application has been developed and maintained by many different 
programmers, each having a different programming style, the structure of the 
application has degraded over time. Added functionality and the existence of 
deprecated functions contributed to this process. This means that the conceptual 
model on which the original prototype was based can hardly be recognized in 
the current application. 



Maintenance of KBS’s by Domain Experts 143 



4 Conceptual Model of the ISB-system 

In this section we will discuss a conceptual model we extracted from the ISB 
system in cooperation with the domain expert. A good conceptual model of 
the system to be maintained is important, especially when this system is as 
inconsistent in its implementation as the ISB-system. As we already explained 
in the previous section, the original conceptual model on which ISB was founded 
is outdated and incomplete. Therefore, we constructed a new conceptual model, 
consisting of a task decomposition of the system and a domain ontology. These 
will be discussed in this section. 

4.1 Task Decomposition 

A new task decomposition of the system was made, representing the tasks the 
system performs. Once the various tasks are recognized, it becomes possible to 
acknowledge the tasks on which maintenance is performed regularly. These tasks 
can then be examined to see if they are suitable for tool supported maintenance. 

We will briefly discuss the tasks the ISB system performs. 

First, the system performs some pre-checks. These determine for example if 
the employer doesn’t have a main office in The Netherlands. These cases are 
treated differently and often do not need further classification. 

After this, the user is asked to enter one or more entries which describe the 
employer’s activities. These entries can be descriptions or verb-noun combina- 
tions. There are several thousands of possible entries. Because many entries are 
very similar, they can often be mapped onto a single entry which will be used in 
the reasoning process. Furthermore, several further entries can be added for use 
in the reasoning process based on the given entries. When this is done, depen- 
dencies between the existing entries are determined to see if certain activities 
are performed in service of other activities (e.g., delivery of fabricated goods). 

Based on these dependencies, the primary activities of the employer are de- 
termined. This is done by asking the user about the nature of these activities. 
This dialog between the system and the user is guided by a decision tree inside 
the ISB system. Only one of these primary activities determines the employer’s 
social function. Each primary activity is classified into its corresponding sector. 
If there are several primary activities which are classified differently, the social 
function is determined by the activity for which the highest wages are payed or, 
if wages are equal, by the expectation of the height of these wages. 

After studying the system and talking to the domain expert, it soon became 
clear that the task on which maintenance was performed most often is the classi- 
fication of primary activities. Also, maintenance on other tasks often originates 
from this maintenance. Thus, the primary focus of our maintenance tool is on 
this task. 

4.2 Domain Ontology 

An ontology describing the domain in which the ISB-system operates served as 
a means of communication with the domain expert. This includes the communi- 



144 Arne Bultman et al. 



cation with us during the design process as well as the communication with the 
tool during the performance of maintenance. Using concepts from this ontology 
allows one to communicate with the expert using his own view on the domain, 
instead of using implementational concepts such as rules, inference mechanisms, 
etc. 

An ontology also makes dependencies between domain concepts explicit. 
These dependencies clarify how changes involving certain concepts influence 
other concepts. 

Figure 1 shows a part of the domain ontology we constructed. We used UML 
as our modelling language. For those not familiar with UML, a good introduction 
can be found in [6]. Note that relations are read from left to right, unless stated 
otherwise; a social function determines a classification, not the other way around. 
To give an impression of the size: the figure shown represents about one third of 
the total ontology. 



Wage 




Total Wage 


amount 




/amount 



A 

costs 



total sum of wages payed 
by the same employer 



< performs_a 



▲ 

pays 

1 



Employer 



Expectation 



{complete, disjoint} 



Company 


0 




o 

3 


r^laccif i/'atirtn 


Activity 


1 


◄ has_a 









has_a 



Secondary 




Primary 




Social 


Activity 




Activity 


1..* determines 1 


Function 



1 determines 



Dependency 



Fig. 1. Part of the domain ontology 



5 The Maintenance Tool 

In this section we discuss the maintenance tool. First, we present a task compo- 
sition of the tool to give an overview of the tasks the tool performs. Following 
this, we describe the global design we used to implement the tool. Finally we 
will give a short impression of the final prototype. 



Maintenance of KBS’s by Domain Experts 145 



5.1 Task Composition of the Tool 

In order to be able to perform maintenance on the ISB system, the tool has to 
perform a number of tasks. These are depicted in Fig. 2. 

The task Knowledge Acquisition involves the interactive acquisition of the ex- 
pert’s knowledge as well as acquiring the knowledge resident in the ISB system. 
The task Knowledge Representation consists of visualizing the acquired know- 
ledge for the expert and translating this knowledge to its ISB representation. 
The acquired knowledge also needs to be interpreted in order to understand how 
the system classifies collected entries {Entry Interpretation) and, if this is done 
by means of a dialog with the user, how this dialog is constructed {Dialog Inter- 
pretation). Lastly, the acquired knowledge needs to be verified to ensure that it 
represents a valid classification (this is not to be confused with validation of the 
knowledge!). 



ISB-knowledge 

maintenance 



- Knowledge Acquisition 



Interactive Acquisition 
ISB Import 



I Visualisation - 
“ Knowledge Representation —\ 

I — ISB Export 



- Knowledge Interpretation 

- Verification 



-C 



Entry Interpretation 
Dialog Interpretation 



Entry Collection 

Knowledge Refinement 

Knowledge Modification 

Dialog Visualisation 



Fig. 2. Task composition of the maintenance tool 



5.2 Global Design of the Tool 

We wanted the design of the tool to be as generic as possible. More specifically, 
the design should be flexible enough to ensure that changes in the maintained 
system as well as changes in the user’s concepts only require changes to their 
respective implementation in the tool. This requires a strict separation of these 
two concept spaces. This was realized by using the global design shown in Fig. 3. 

On the left, there’s an expert layer which contains the objects which re- 
present the expert’s concepts, such as decision trees and dialog based classifica- 
tion. On the right, there’s an ISB layer which contains objects which represent 
ISB-specific concepts such as rules and parameters. To enable communication 
between the two layers, there are two interfaces: the expert layer interface (ELI) 
contains knowledge about the objects in the expert layer and the ISB layer in- 
terface (ILI) about the objects in the ISB layer. On top of this, there is a GUI 
which is used to communicate with the user. It shows graphical representations 
of expert objects and enables the user to manipulate them. 



146 



Arne Bultman et al. 




Fig. 3. Global Model of the maintenance tool 



Let us give an example to illustrate the principle behind this design. Suppose 
the expert has created a new dialog. What happens when he presses the Save- 
button is the following. The dialog asks the ELI to save it. The ELI then requests 
the ILI to create the necessary ISB-concepts in order to save the dialog. The ILI 
will then create the needed objects, asking the ELI about the structure of the 
dialog. The created objects are then saved in the ISB system. 

We found that using this design has several advantages. First, changes made 
to one layer only require changes in its corresponding interface. When, for in- 
stance, the ISB system would be reconstructed so that the syntax of the rules 
changes, the only thing which would have to be adapted is the method in the ILI 
which generates this rules. Also, this construction would make it easier to imple- 
ment different views on the system for different types of users. Additionally, the 
separation of the two layers allows developers to work on the layers individually, 
as long as the interfaces are defined. 

5.3 The Prototype 

Using the global design and the task analysis, a prototype which implemented 
most of the required features was build. The prototype allows the expert to 
make new classifications for given entries, representing them in terms of the 
expert layer that are understandable to the expert. Figure 4 shows the most 
interesting screen of the prototype, the dialog editor. The dialog editor enables 
the expert to view and construct a decision tree which represents the dialog to 
be executed by the ISB system. It consists of three parts: 

— Decision Tree Window. This part of the screen shows the actual tree. 
Buttons represent parameters and conclusions. The possible values of a pa- 
rameter are determined by its associated question’s possible answers. These 
values are shown in the white boxes on the lines between the buttons. Leaves 
of the tree represent a conclusion and are printed italic. 

— Edit Window. Whenever a button is pressed in the decision tree window, 
the corresponding properties of its parameter /conclusion are shown here. 
Each type of button has its own property tab. Using this tab, the user can 
select the question to be asked or create a new question, for example. 



Maintenance of KBS’s by Domain Experts 147 



— Status Window. As long as the tree is not finished, the missing or in- 
correct items are reported here. Suppose a conclusion does not yet have a 
classification, then a warning is given. The tree can only be saved when no 
more warnings are present. This ensures that the tree is logically complete 
and correct and that every conclusion is valid (NOT necessarily the right 
one). Whenever a new classification is saved, it is translated to a collection 
of rules (ISB concepts). 



—Decision Tree Window 



iS File 

MBI 

hasEmployees p- | Yes | - ffJss.So/uf/ont| 

No hasOffice [ -p- | yi 



D 



) Possible Solution Poss.Solutionl not defined 
) Possible Solution Poss.Solution2 not defined 
) Possible Solution Poss.Solution3 not defined 



Edit Window 



Name; [hasEmployees 
Question: S0012 Change | 



Possible values: 



Yes 

No 



Default value: [ves 
Selected value 
This value can be 
determined by the 
following entries without 
asking a question: 

Add entry | 
Process 

Save 






Status Window 



Fig. 4. Screenshot of the dialog editor (translated) 



6 Evaluation 

No full scale evaluation was performed using the tool described in the previous 
sections, but some initial results with domain experts were obtained, and these 
are reported in this section. 

A domain expert (who is not an IT specialist) was able to build several new 
classification dialogs corresponding to existing update-requests using the tool 
after just 5 minutes of explanation. 

This result is all the more encouraging since the domain expert that evaluated 
the maintenance tool was not the same domain expert that we used during the 
knowledge acquisition stages. The fact that this did not cause any substantial 
difficulties suggests to us a common frame of mind for both experts. It seems that 
the underlying ontology used (implicitly) by both experts is sufficiently similar 
that such an ontology is a good basis for a usable maintenance tool, thereby 
confirming our main hypothesis. 



148 Arne Bultman et al. 



Notwithstanding this early success, a serious shortcoming was identified in 
the prototype of our tool. Because of irregularities in the implementation of the 
ISB system, it was not possible to provide a uniform translation procedure from 
the ISB layer to the expert layer. As a result, existing dialog-based classifications 
can only be updated by providing an entirely new definition for the dialog, the 
existing dialog cannot be reconstructed and modified. 

7 Conclusions 

The main points we have argued in this paper are that 

1. it is possible to build maintenance tools that are usable by domain experts; 

2. such tools should be based on conceptual models that are close to the domain 
experts (comprising both task-model and domain ontology of the system to 
be maintained); 

3. such task-models and domain ontologies can be (re) constructed after the fact 
for existing systems, if this is required; 

4. a principled two-layer architecture can be used to implement the connec- 
tion between a maintenance tool (based on a conceptual model) and the 
performance system that is to be maintained. 

In particular the construction of a domain ontology has been essential in the 
development of a maintenance tool that was usable by domain experts. Such an 
ontology captures the conceptual notions that the expert is familiar with. This 
ontology can then be connected with the structures in the actual implementation 
of the performance program in order to realize actual maintenance operations on 
this performance program. This does not amount to simply adding an ontology 
to an existing system. Instead, the far reaching effect of introducing the explicit 
ontology is that the domain expert performs maintenance operations on the 
conceptual model (which he understands), rather than on the implementation 
system. 



References 

1. Computer Associates. PLATINUM Aion. 

HTTP://www. ca.com/products/platinum/appdev/aiori-ps.htm. 142 

2. B. Chandrasekaran. Generic tasks in knowledge based reasoning: High level build- 
ing blocks for expert system design. IEEE Expert^ l(3):23-30, 1986. 141 

3. W. J. Clancey. Heuristic classification. A/, 27:289-350, 1985. 141 

4. Henrik Eriksson, Angel R. Puerta, Mark A. Musen, John H. Gennari, Thomas E. 
Rothenfluh, and Samson W. Tu. Custom-tailored development tools for knowledge- 
based systems. Knowledge Systems Laboratory, Medical Computer Science, Jan- 
uary 1994. 141 

5. Yolanda Gil and Marcelo Tallis. Transaction-based knowledge acquisition: Com- 
plex modifications made easier. In Proceedings of the Ninth Knowledge Acquisition 
for Know ledge- Based Systems Workshop. Banff, Eebruary 1995. 141 



Maintenance of KBS’s by Domain Experts 149 



6. Craig Larman. Applying UML and Patterns. Prentice Hall PTR, 1997. 144 

7. S. Marcus and J. Mcdermott. Salt: A knowledge acquisition language for propose- 
and-revise systems. Artificial Intelligence^ 39(1): 1-38, 1989. 141 

8. E. H. Shortliffe. Computer-Based Medical Consultations: Mycin. American- 
Elsevier, New York, 1979. 141 

9. I. Sommerville. Software Engineering. Addison Wesley, Bonn, Germany, 1987. 139 

10. G. van Heijst, A.Th. Schreiber, and B. J. Wielinga. Using explicit ontologies in 
kbs development. International journal of human- computer studies, 45:183-292, 
1997. 140 

11. W. van Melle. A domain independent production rule system for consultation 
programs. IJCAI, 1979. 141 



A Simulation-Based Procedure 
for Expert System Evaluation 



Chunsheng Yang*, Kuniji Kose^, Sieu Phan*, and Pikuei Kuo^ 



* National Research Council, Montreal Rd, M-50, Ottawa, ON, KIA 0R6, Canada 
{chunsheng . Yang, Sieu . Phan}@iit . nrc . ca 
^ Faculty of Engineering, Hiroshima University, Higashi-Hiroshima, Japan 
^ National Taiwan Oeean University, Keelung, Taiwan, Republie of China 
pkkuo@mail . ntou . edu . tw 



Abstract. It is well reeognized that the evaluation of the knowledge-based 
system is very important and diffieult in the development of expert systems in 
the domains sueh as aviation and navigation. To alleviate some of the 
diffreulties, and to reduee the development eost, the authors proposed an 
evaluating proeedure based on simulation. The proposed proeedure was 
designed to validate the fimetionality and eapability of an expert system that 
was developed to provide deeision-making support in ship navigation. In this 
paper, the developed expert system for eollision avoidanee is first outlined, and 
the proeedure to evaluate the developed expert system by using a simulation- 
based approaeh is presented in details. It is also eoneluded that the simulation- 
based proeedure is feasible and effeetive for evaluating expert systems in a 
number of different domains. 



1 Introduction 

In the development of real-world problem expert systems such as collision avoidance 
in ship navigation or intelligent alarm correlation in telecommunication, the 
evaluation of the developed knowledge-based system is one of the most important 
stages. Before a knowledge-based system can be deployed, it must be evaluated for 
accuracy. Generally, the knowledge-based systems designed to solve real-world 
problems are very large and contain thousands of rules; thus verification becomes 
difficult. First, it is hard to examine possible interactions of rules simultaneously. 
Second, in order to carry out verification checks, it is necessary to obtain certain 
additional information such as observable variables. Third, it is difficult to set criteria 
for each observable variable. There have been lots of research in the field of 
verification and validation (V&V) of the knowledge-based systems. There have been 
many achievements in this field. These results focus on V&V theory and 
techniques [1,2,3, 5], V&V systems and specifications, and V&V applications [4,6,7]. 
Some examples are as follows: M. Benerecett [1] applied model checking technique 
to multiagent system verification; D. Fensel [7] deployed KIV (Karlsruhe Interactive 
Verifier) for the verification of conceptual and formal specifications of knowledge- 
based systems; and M. Ramaswamy [4] presented a technique based on directed 
hypergraphs that enables developer to determine overall integrity of the rule bases by 
verifying partitions locally. All of these techniques are very useful for evaluating the 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 149-160, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




150 Chunsheng Y ang et al. 



knowledge-based systems from the viewpoint of accuracy of knowledge modeling 
and rule bases. However, it is expected that evaluation of knowledge -based systems 
can be carried out in a realistic environment, which allows domain expert to evaluate 
the knowledge-based systems directly. It is impossible to directly test and evaluate the 
developed expert system in a real application, because the procedure is too costly and 
unsafe, especially in the domain of aviation and navigation. To reduce the cost of 
development and perform evaluation in a realistic environment, simulation is 
considered to be the most effective and feasible approach, especially for knowledge- 
based systems in the ship navigation and aviation domain. The authors suggest two 
kinds of simulation: fast-time simulation and real-time simulation. Fast-time 
simulation is a scenario-based approach to test the knowledge-based systems. It is 
used to test the common basic problem-solving ability of the knowledge-based 
systems and to fine-tune the knowledge bases. Real-time simulation is a realistic 
environment, which can give the human being a real-world feeling and allow domain 
expert to evaluate the problem-solving ability of the expert system directly. Based on 
such simulations we proposed an evaluating procedure for domain-oriented expert 
system such as collision avoidance expert system. The evaluating procedure includes 
four steps: identifying evaluation purposes, identifying evaluation items and indices, 
designing and performing simulation tests, and analyzing results. In this study, the 
authors developed a collision avoidance expert system to assist navigators in the 
decision-making process in ship navigation [8,9,10]. Then we evaluated the 
developed system based on the proposed evaluating procedure. In this paper, we 
outline the developed expert system for collision avoidance in Section 2; then the 
simulation-based evaluating procedure is presented in Section 3; and how to evaluate 
the developed expert system using the proposed evaluating procedure will be 
discussed in Section 4. Conclusions are given in the last section. 



2 Developed Expert System for Collision Avoidance 

The goal of the collision avoidance expert system is to assist ship navigators in their 
decision-making process to avoid collision. The developed knowledge-based system 
has been incorporated into the Integrated Navigation System (INS) [8,9] at Hiroshima 
University as an intelligent decision-making support subsystem [10]. In this section, 
the outline of the developed expert system is presented in order to describe how to 
evaluate it in later sections. 

It is well known that the procedure followed by a captain during the ship handling to 
avoid collision consists of collecting information, assessing the encounter situation, 
determining the collision avoidance action and executing the action. In making a 
decision for avoiding collision, the captain decides the action of the collision 
avoidance using encounter situation, traffic regulations, his experience, and judgment 
obtained by visual information. The feasibility of such a decision is limited by the 
constraints of the visual field, misunderstanding of the information and the action of 
the target ships, executing miss of the actions, and so on. When the captain decides to 
avoid a collision, he can only focus his attention on the most dangerous target, and 
cannot pay attention to other target ships encountering with his ship due to his 
capability of information processing. Therefore, the authors proposed to develop a 




A Simulation-Based Procedure for Expert System Evaluation 151 



knowledge-based collision avoidance support system, which is able to provide 
decision-making support for operators. In other words, such a support system should 
be able to compensate the human deficiency and to provide an effective maneuvering 
action to operators. Such an expert system should possess the following abilities: 

• Sophisticated problem-solving ability; 

• Full safe navigating ability; 

• Prediction of target ships' action ability; 

• Interactive ability; and 

• Real-time responding ability. 

To reach these objectives, the authors concentrate on the development of a 
knowledge-based system for identifying an effective collision avoidance action. To 
analyze effectively multi-ship encounter situations, the authors introduce a target ship 
classification method and the concept of the most dangerous ship, the dangerous ship, 
the restricting ship, the indifferent ship, and the unblocking scope of ship handling 
space. To improve the feasibility and safety of the collision avoidance action, the 
authors propose a predicting method called a prediction of plural action for the action 
prediction of target ships. Figure 1 shows the inference sequence of the system. In 
terms of the encounter situation and the safety evaluation, the inference engine selects 
necessary sequence for the inference procedure. The inference procedure uses 
knowledge bases built for the developed expert system. As shown in Figure 2 the 
collision avoidance expert system is designed based on hierarchical architecture and 
modularized knowledge structure. The top layer in the system is the inference control. 
It is responsible for the control of the inference procedure. The second layer is the 
main knowledge bases, which include the classification of target ships, the prediction 
of target ships’ action, the identification of the method of collision avoidance, and the 
establishment of course-line waypoints. The third layer contains knowledge modules 
of every knowledge base. The fourth layer includes the preliminary knowledge 
modules such as traffic regulation, identification of target ships, and so on. The 
system consists of the following inference sequences: 

(1) Prediction of Target Ship’s Scheduled Action 

First, the system uses a knowledge base to classify the navigation environment into 
one of three categories: open sea, coastal, or route navigation. The prediction of the 
target-scheduled action depends on this classification: in the case of open sea or 
coastal navigation the target will maintain its current course and speed, and in the case 
of the route navigation, it will follow the route 

(2) Classification of Target Ships 

After the computation of collision risk of target ships using of the Nagasawa Risk 
Model [11], the target ships are classified as the most dangerous , the dangerous, the 
restricting, or the indifferent ship depending on their risk of collision. A dangerous 
ship is defined as a ship having risk of collision that exceeds the safe level when both 
ships maintain their scheduled course lines. In the case of several dangerous ships, the 
most dangerous ship has the highest risk among dangerous ships. A restricting ship is 
defined as a ship that will cause no danger if own ship and target ship maintain their 
scheduled course lines, but it will frustrate the action of own ship if she takes the 
collision avoidance action for a dangerous ship. The target ship that lies outside the 
maneuvering space of the own ship is defined as the indifferent ship. 




152 Chunsheng Y ang et al. 




Fig.l The Inference Sequence of the System 

(3) Prediction of Target Ship’s Collision Avoidance Action According to the 
Classification of the Ship 

For the dangerous shiSps and restricting ships, their collision avoidance actions are 
predicted by using the same approach and knowledge base as that of the own ship. 
The predicted actions of collision avoidance of target ships will be incorporated into 
the procedure when own ship’s action of collision avoidance is formulated. 

(4) Establishment of the course line waypoints as collision avoidance action 

The action of collision avoidance of own ship is formulated basically against the most 
dangerous ship. The action in the own ship maneuvering space is evaluated 
considering the prediction of the action of the dangerous ships and the restricting 
ships. As a result, the most efficient and feasible action is selected. 

3 Simulation-Based Evaluating Procedure 

After we built the prototype of the expert system by using Nexpert Object^, C 
Language and Unix Platform (SUN Workstation), we have to test and evaluate the 
developed expert system, and show the system’s capability to the domain expert. 
Otherwise, it is impossible to deploy the developed system to the real application 



^ Nexpert Objeet is an expert system development environment whieh ean provide produetion knowledge 
representation and objeet knowledge representation. It ean also provide a powerful GUI and API for 
developers to develop knowledge-based system. 









A Simulation-Based Procedure for Expert System Evaluation 153 



field. Therefore, how to test and evaluate the expert system becomes very important. 
Because different domain has different requirements, different application 
background, different technical support, and different targets, it is very difficult to 
propose the same approach and criteria for evaluating different domain expert 
systems. Of course, one might argue that the best way to test the system is 



Inference Control 




Fig.2 The Architecture of the Developed Collision Avoidance Exert System 



to try it out in a real application environment. However, it will largely increase the 
cost and cause unexpected thing happen. It is impossible for a domain operator to 
accept such an approach. So, we proposed a simulation-based procedure for 
evaluating the expert system. To systematically and effectively evaluate the expert 
system, we suggested that the simulation-based approach should include the following 
procedures: 



• Identifying evaluation purposes 

In terms of the requirements and the goals of the development, we need to 
identify the evaluation purposes for the expert system. 















1 54 Chunsheng Y ang et al. 



• Identifying evaluation items and their indices 

To meet the above evaluation purposess, we need to determine the evaluation 
items that can reflect the original design requirements and evaluation 
purposes. In order to prove that the evaluation items are feasible and 
acceptable, for each evaluation item, we need to specify some indices. Such 
indices might be either quantitative or qualitative. 

• Designing and performing evaluation simulation 

Some of the above evaluation indices can be obtained by personal judgment 
from the designers or domain experts; some can be obtained from simulation. 
We suggested that it is better to carry out two kinds of simulations: fast-time 
simulation and real-time simulation. Fast-time simulation can be used to test 
typical ability of the expert system, to quickly identify problems in the 
knowledge base, and to fine-tune the knowledge base. Real-time simulation 
can be used to provide a realistic environment for testing the system. In real- 
time simulation, we need to ask the domain experts to evaluate the system 
and obtain their comments and feeling about the system, then to improve the 
system. Therefore, to carry out such simulation, it is necessary to design the 
simulation scenario, to develop simulation environment that allows operators 
to interact with the knowledge base. 

• Analyzing the results 

As described above, evaluating indices might be either quantitative or 
qualitative. Therefore, after carrying out the evaluating simulation, we need 
to figure out these indices for judging the evaluating items by analyzing the 
simulation results quantitatively or qualitatively. 

Corresponding to different domain expert systems, the specific content of the above 
procedure might be different. Therefore, let us take our developed expert system as an 
example to describe how to evaluate expert systems using the proposed evaluating 
procedure. 



4 Evaluation of the Developed Expert System 

In this section, how to evaluate the developed expert system for collision avoidance 
decision-making support by using the proposed simulation-based evaluating 
procedure is discussed. 

4.1 Evaluation Purposes 

According to the requirements and developing target of our domain expert system, the 
evaluation purposes are to validate the capability of decision-making support, to test 
the problem-solving ability of collision avoidance, and to identify the problems for 
further refinement of the knowledge base. 

4.2 Evaluation Items and Indices 

To meet the above objectives, we define the following items as the evaluating items in 
terms of the designing requirements and the targets of the developed system: 




A Simulation-Based Procedure for Expert System Evaluation 155 



• collision avoidance problem-solving ability; 

• ship navigation safety; and 

• decision-making support ability. 

The collision avoidance problem-solving ability is defined as the fact that no collision 
will happen during ship navigation and the collision avoidance action is reasonable 
and does not disturb other ships. The ship navigation safety is defined as the fact that 
the desired safety level must be satisfied during ship navigation. And the decision- 
making support ability is defined as the fact that the system should be able to provide 
the effective support for operator’s decision-making and to alleviate their burden on 
decision-making. In terms of these items, we define some indices that may be either 
quantitative or qualitative. The quantitative indices are: 

• safety level of own ship, 

• target ship risk related to own ship action, 

• deviation of course line, and 

• response time, 
and the qualitative indices are: 

• readable information of decision-making support, 

• feasibility of proposed action, and 

• accuracy of the collision avoidance action. 

All of these indices have to be obtained from fast-time simulation or real-time 
simulation. For qualitative indices, we asked domain experts to give their evaluation 
by using questionnaire. For the quantitative indices, we derived the results from the 
simulating results by analyzing collected traffic data. The safety level of own ship is 
calculated following the Nagasawa’s risk model [11], and the target ship risk related 
to own ship’ collision avoidance action is defined as the risk level from the viewpoint 
of target ship due to the own ship’s collision avoidance action. It is also calculated 
with the Nagasawa’s risk model. The deviation of course line is defined as the 
distance between the scheduled course line and the course line of collision avoidance 
action. It reflects the cost of collision avoidance action. 



4.3 Evaluation Simulation 

To obtain the evaluation result of the above indices, two kinds of simulations are 
carried out. They are fast-time traffic simulation and real-time ship handling 
simulation. 

(1) Fast-Time Traffic Simulation 

Traffic simulation is carried out for testing basic problem-solving ability. We ask 
navigators to build typical encounter situation scenarios according to their navigating 
experience and international traffic regulations. Using these traffic scenarios [9], the 
simulation is done in the developed traffic simulation environment as shown in 
Figure 3. This simulation environment includes navigating environment system, 
graphic user interface (GUI), the developed expert system, and simulation result log 
system. The navigating environment system can provide all of the necessary data for 
own ship, target ships, and navigating area (open sea, congested waterway, and route). 
The GUI shows simulation results, which are ship trajectories with time history and 




156 Chunsheng Y ang et al. 



allows the operator to interact with knowledge base system. The developed expert 
system is one that we want to evaluate; the log system can record history data of 
simulation and can reprint these recorded data to check the executing result of the 
knowledge-based system. These simulation data will be analyzed in the final analysis 
step. 



(2)Real-Time Ship-Handling Simulation 

To carry out real-time ship-handling simulation, it is necessary to build a realistic 
simulation environment and simulation scenario. Using the ship-handling simulator 
and the INS [9], we constructed a simulation system for the developed knowledge- 
based decision-making support system. This system is shown in Figure 4. Such a 




Dwdq^edBq^ert System 
fer(^]]MnA\ddaixeD^ 
makingSuppat 




Navigating simulation Environment 

Own sh ip c 



c 



Navigating 






Fig.3 The Composition of Traffie Simulation System 

simulation system could provide a real-time ship handling environment. It possesses 
the following features. 

• ship motion model: MMG model; 

• ship guidance method: Optimum control; 

• ship position system and its error: GPS and GPS error; 

• reproducible navigating environment and scene of ship handling; and 

• real feeling of the danger during ship handling of collision 
avoidance. 

Such a simulation environment is very useful for evaluating the developed expert 
system, because ship-handling simulator could provide the scene of ship handling and 
the feeling of the danger during the collision avoidance to the operators. The 





A Simulation-Based Procedure for Expert System Evaluation 157 



developed knowledge-based system is incorporated into the INS as a decision-making 
support system. It might propose an action of collision avoidance to operators. 
Operators decide the final action according to their judgement with the help of 
support function of the INS. Meanwhile, the domain experts might easily evaluate the 
result of collision avoidance and give us their comments and requirement and feeling 
on the system. Simulation scenario is another important factor for effective 
evaluation. It must be built to reflect a realistic navigation environment. In this study, 
the simulation scenarios are designed as follows: 

• Own ship particulars: Container: L=174m, B=26m, d=9.4m, 

Cb=0.57 and GRT = 16000GT; 

• Navigating area: Coast area, congested Japan sea; 

• Target ships are generated in terms of variable degree of difficulty 
for ship operation during the collision avoidance; 

• Own ship is controlled by the developed expert system; and 

• The own ship encounters 3 or 4 target ships on its course line 
frequently. These encounter situations are designed by navigators. 




Fig.4 The Composition of Real Time Simulation System 

Using such simulation environment and scenarios, we asked navigators who have 
different navigating experiences to handle ships in two cases: manual ship handling 
and ship handling with the help of the developed expert system. Manual ship handling 



158 Chunsheng Y ang et al. 



is carried out using visual information and radar information. Operators determine the 
timing of rudder operation, the action of collision avoidance, and so on. In the case of 
ship handling with the help of the developed expert system, operators only need to 
confirm the proposed action from the decision-making support system. During 
simulation, the system records all the ships’ trajectories for later data analysis. In 
order to obtain the personal judgement and personal feeling about the system, we also 
interviewed the operators by a set of questionnaires on the effectiveness of decision- 
making support. 

4.4 Simulation Result Analysis 

After carrying out the simulation, we obtained simulation data and subjective 
judgement from domain experts. From these results, we derive the evaluating results 
for each evaluation item or index quantitatively or qualitatively. For fast-time traffic 
simulation, we focus on the ship trajectory analysis. Using simulated data, it is 
possible to show time history status such as the action of collision avoidance, action 
timing, safety distance between own ship and target ship, and so forth to domain 
experts. These results can help us to judge the correctness of collision avoidance 
action, the safety of ship navigation, and so on. For real-time simulation, we obtained 
the questionnaire results from operators about their subjective evaluation and 
simulation results. Operator‘s evaluation can be used to determine the qualitative 
indices such as the effectiveness of decision-making support information, the 
feasibility of proposed action and correctness of collision avoidance action and so 
forth. On quantitative analysis of simulation results, we focus on some main indices 
such as the safety level of own ship, the target ship risk corresponding to own ship 
action, and the deviation of course line. Due to space limitation, these results [9] 
can‘t included in this paper. Meanwhile, it is possible to compare the simulation result 
of manual operation with that using expert system support in order to prove the 
effectiveness of decision-making support during collision avoidance action. 

Finally, in terms of the obtained evaluation indices, it is necessary to give an 
evaluating result for each evaluation item defined in the procedure. The following are 
our evaluating results: 

• Problem-solving ability for collision avoidance 

From simulation results of traffic simulation, the actions of the collision 
avoidance are reasonable and in accordance with the traffic regulations. In 
the light of the deviation of scheduled course line and target ship risk 
corresponding own ship’s collision avoidance actions in the real-time ship 
handling simulation, the action of own ship does not cause any disturbance to 
target ships, and own ship can abide by traffic regulation to navigate in lane 
in the case of route navigation. Therefore, it can be said that the developed 
expert system possesses sophisticated problem-solving ability for collision 
avoidance. 

• Ship navigation safety 

From the results of the quantitative indices, it is found that the developed 
expert system can keep good safety level during navigation no matter of 
navigating area. In terms of the comparison of simulation results of manual 




A Simulation-Based Procedure for Expert System Evaluation 159 



ship handling and ship handling using the decision-making support system, 
we found that the safety level of own ship in the case of using the expert 
system is better than that in the case of manual ship handling. 

• Decision-making support ability 

According to the evaluation indices obtained from the simulation result 
analysis and operators’ responses to questionnaire investigation, we can say 
that the developed expert system can provide effective support for operators’ 
decision-making. As to the decision-making support information, the 
indication of the dangerous ship is very welcome and quite effective, but the 
information of the restricting ships and the indifferent ships is not useful for 
operator’s decision-making. As a proposed maneuvering action for collision 
avoidance, the course line waypoints are effective and needed by operator. 

Also, from the evaluation based on simulation, some problems such as flexibility of 
problem-solving, adaptability of high traffic density navigating area, etc. are found. 
According to the requirements of the system, it is necessary to improve the developed 
system. 



5 Conclusions 

In this paper, the authors proposed a simulation-based procedure for evaluating 
domain-oriented expert systems and discussed the evaluation of an expert system that 
we developed to assist ship navigators in their decision-making process to avoid 
collision. From the evaluation results obtained, it is concluded that the proposed 
evaluating procedure is effective for evaluating the developed expert systems. It 
reduces development cost and improves developing efficiency of the system. The 
proposed evaluating procedure can be applied to different domains such as aviation. 
Based on the authors’ empirical experience, it is expected that criteria should be set 
up for evaluation indices. Therefore our future work is to study how to set up such 
criteria according to usability factors, evaluation environment, domain requirements, 
and system specifications. 



References 

1. Massimo Benerecett, Fausto Giunchiglia, and Luciano Serafmi, Multiagent 
System Verification via Model Checking. Proceedings ofEUROVAV 98, 1998 

2. Catholijn M. Jonker, Jan Treur, and Wieke De Vries, Compositional Verification 
of Agents in Dynamic Environment: a Case Study. Proceedings ofEUROVAV 
98, 1998 

3. Francky Trichet and Pierre Tchounikine, Verifing and Validating a Task/Method 
Knowledge-base. Proceedings ofEUROVAV 98, 1998 

4. Mysore Ramaswamy and Sumit Saeker, Global Verification of Knowledge Based 
Systems via Local Verification of Partition . Proceedings ofEUROVAV 97, 1997 




1 60 Chunsheng Y ang et al. 



5. Frans Coene and Paul Dunne, Verification and Validation of Rulebases Using a 
Binary Encoded Incidence Matrix Technique, Proceedings of EUROVAV 97, 
1997 

6. Anca I. Vermesan, Knowledge-based Systems: Verification and Validation in the 
view of Certification, Proceedings of EUROVAV 97, 1997 

7. Dieter Fensel and Arno Schonegge, Specifying and Verifying Knowledge-based 
Systems with KIV, Proceedings of EUROVAV 97, 1997 

8. K. Kose and C. Yang, A Collision Avoidance Expert System for Integrated 
Navigation System and Its Brush-up, Journal of the Society of Naval architects of 
Japan. Vol. Ill, 1995 

9. C. Yang, An Expert System for Collision Avoidance and Its Application, Ph.D. 
Thesis, 1995 

10. K. Kose, Y. Ishioka, and C. Yang, A study on a Collision Avoidance Expert 
System for Decision-making Support in Integrated Navigation System, Journal of 
The Society of Naval Architects of Japan. Vol.178, 1996 

11. A. Nagasawa, K.Hara, K.Inoue, and K.Kose, The Subjective Difficulties of the 
Situation of Collision Avoidance II — Toward the Rating by Simulation, Journal 
of Japan Institute of Navigation. Vol.91, 1993 




Gas Circulator Design Advisory System: A Web Based 
Decision Support System for the Nuclear Industry 



J. Menal, A. Moyes, S. McArthur, J.A. Steele, and J. McDonald 

Centre for Eleetrieal Power Engineering, University of Strathelyde 
204 George St., Glasgow G1 IXW, United Kingdom 
Tel.: +44 (0)141 548.26.65 

judith .menal -tolsa@s t rath .ac.uk 



Abstract. Teehnologieal advanees within the field of automation of power 
plant (eleetrieal or nuelear) have led to the inereasing need for deeision support 
systems to be developed. This eoineides with the inereasing use of web based 
teehnologies in the eonstruetion of sueh applieations in a plethora of different 
areas. The aim of this paper is to deseribe the applieation of intelligent systems 
to data obtained from the eleetrieal and nuelear industry groups, using a web- 
oriented environment. The first stage of the work has been knowledge 
management, whieh is a basie requirement in order to obtain the expertise to be 
entered into the system under eonstruetion. The knowledge management stage 
produees a knowledge base, whieh eonsists of three different units: a document 
roadmap, where eross-referenees to the doeumentation system ean be found; a 
knowledge model, whieh eontaining a deseription of the domain and its set of 
rules; and a case base, where struetured information on historieal solved eases 
is stored. The system is eompleted with a friendly User Interfaee. This takes the 
form of a web environment where the user is able to interaet with the system in 
order to initiate a seareh into the information system, and to obtain the results of 
the query. 

Keywords. Deeision support system, web based teehnologies, intelligent 
knowledge based system, knowledge management, knowledge model. 



1 Problem Description 

Similar to many industries, the Nuclear Industry is losing experienced engineers as a 
result of retirement. These circumstances have highlighted the problem of the loss of 
their valuable knowledge within their companies. Although they may train personnel 
before leaving the company they are unlikely to transmit to them all their experiences 
and expect them to learn everything just in a few sessions. Due to this reason a 
proposition was made to create a system that could store most of this knowledge and 
experience and then provide the new engineers with focused and prioritised 
information. 

On the other hand, in order to demonstrate that this approach could provide a 
solution to this problem, information about an important and critical element in the 
operation of a nuclear plant was chosen, i.e. the Gas Circulator. When necessary 
maintenance work or design changes are required for the gas circulators, plant 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 160-167, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Gas Circulator Design Advisory System 161 



engineers must provide detailed justifications of proposed changes that demonstrate 
the relevant technical and safety criteria have been satisfied. The specification and 
justification of design changes related to gas circulators must show consistency with 
documented standards, constraints, operating experience and the impact of these 
changes to other parts of the plant. 

This information, plus details related to the components within the gas circulator 
and their behaviour, is stored in the different parts of the proposed system. 

The resulting Gas Circulator Design Advisory System (GCDAS) is a web based 
decision support system that utilises case based, knowledge based [1] [2] and database 
techniques to provide engineers with focused and prioritised information related to a 
requested query in a easy navigational way. This facility reduces the effort required 
by less experienced engineers in identifying useful sources of information. The use of 
knowledge management techniques was proposed to help the way of structuring the 
information within the system, making it as close as possible as the way a human 
expert would have it. 




Users query 

Power station 
Job type 
Component(s) 




Gas 

Circulator 
Information 
Store 







Knowledge Base 
Result 






Case Base Result 




Technical interest 




Document Search 


Case desrciption 




Plant affected 




Results 


Reference 




Comments 





Fig. 1. High level deseription of the GCDAS system 



2 Application Description 

GCDAS is a decision support system which accepts user queries via a point-and-click 
web interface and return two types of information: documentation references and 
technical information relating to the design problem and its context. In Fig. 1 we see 
the diagram that represents the system from a high level point of view. 



162 J. Menal et al. 



2.1 Knowledge Acquisition 

The first part of the development of this project was the structured knowledge 
elicitation, transcription, modelling and validation process. 

Knowledge Elicitation. The Gas Circulator knowledge was elicited using the 
recorded interview technique between knowledge engineers and a nuclear system- 
engineering expert. These sessions consisted of a focussed discussion on particular 
topics in order to get the related specific information. Case studies have also been 
used in order to access low level information, which may not be captured in the 
focussed discussion. 

Knowledge Transcript. After that the recording of the knowledge elicitation session 
was transcribed into a document containing the whole expertise. This transcript 
reflects the knowledge engineer’s understanding based on the expert’s knowledge. 
This document needs to be dealt with great care in order to avoid any ontological and 
technical confusion. 

Knowledge Modelling: KADS methodology. In the knowledge modelling activity, 
models of expertise represent the acquired knowledge. From them it is possible to 
assess the relevance of the acquired knowledge for achieving the solutions of a 
particular problem. Based on the analysis of the developed transcript, sets of models 
have been produced in accordance with the role of the acquired knowledge in the 
problem solving method. The KADS [3] methodology provides a structured 
mechanism for producing these models. Because of that, it was utilised to analyse and 
characterise the results of the interviews. This methodology decomposes the 
knowledge into three levels of expertise: domain, inference and task. Because of that, 
the use of KADS allowed the definition of the key tasks in the gas circulator design 
process in addition to identifying pertinent aspects of the structure and behaviour of 
these elements. 

For instance, within the knowledge that defines this domain there are descriptions 
about the entities in the plant, each of which has a behaviour that can be expressed 
through defining parameters (temperature, pressure, current, etc.) These parameters 
can be useful to describe changes in the state of defined objects and their interaction 
with others. Some of the most important parameters identified during the knowledge 
elicitation activity include: physical constraints, safety constraints, departmental 
responsibilities for design changes and cross-references. 

Knowledge Validation. An important part of the knowledge acquisition process was 
the use of knowledge validation techniques to ensure that appropriate information was 
being captured correctly, to fill the gaps in the knowledge acquired, to clarify any 
obscure or difficult aspects and even to deal with further questions. This process 
involves the submission of both transcript and knowledge models to the expert in 
order to correct any misunderstandings. This process is divided into three steps: 
validation of the transcript, validation of the knowledge models, and validation of the 
case studies. The validated knowledge transcripts, models and case studies formed a 
Gas Circulator design knowledge archive. 




Gas Circulator Design Advisory System 1 63 




Circulator \ 

Design Cy v? 

Advisory 



ISelEctdiEjibcIas 







Qient 







i 1 




r > 
















CGI 




Java 






Interface 




Search 

Engine 


4^ 


Case 

Base 


V J 




V ■ 







Document 

Database 



_] Intranet] J 



Fig. 2. GCDAS Application Design 



2.2 Gas Circulator Information Store 

GCDAS utilises an intelligent search engine, which combines case based and 
knowledge based approaches to provide engineers with focused design material. A 
more specific design of the application can be seen in Fig. 2. The ‘Gas Circulator 
Information Store’ described in Fig. I has been factored into three elements: 

• the Knowledge Base, in which structured knowledge coming from the 
elicitation process is stored. Basically, this resource consists of a description of 
the domain and the rules associated with it; 

• the Case Base, where structured information on historical solved cases is 
stored; 

• the Document Database, which contains cross-references to the 
documentation system. 

As can be seen, the knowledge base is heterogeneous, as it is the result of integrating 
three resources of very different nature. According to this, the principle of searching 
into the knowledge base is characterised as follows: 





164 J. Menal et al. 



• The Knowledge Base: the user wants to find one or several possible solutions to 
the design problem formulated within the domain, which is compatible with all 
the rules described in the knowledge model. This search was achieved by means 
of an inference engine; 

• The Case Base: to obtain similar historical cases to the one that the user is 
interested in, the search engine acts according to ‘case similarity’ (or ‘case 
proximity’), in a similar manner as Case Based Reasoning identifies similarity 
among cases. As justified later, CGDAS does not have automatic learning 
features, therefore no case adaptation is performed. 

• The Document Database: the search is made by means of a string comparison 
between the documents in the database and the user search word(s). 

Integration between the three different search engines is provided, and the user 
receives a single response from the search engine, containing the combined results 
from each individual search. In case that contradiction is present between the different 
results, it is up to the user to identify it. 



2.3 Search Engine 



The Java [4] Search Engine provides facilities for accessing information from each of 
these knowledge and data sources and collating the results into a structure, which can 
be interpreted by the web, based interface. 

Java programming language was used mainly for two reasons. Being an Object- 
Oriented Language, the domain hierarchy of the knowledge base could be easily 
expressed. Also, in case that future extensions of the system required a change in the 
general architecture (e.g. migrate from CGI interface to agent-based architectures), it 
was thought that an important part of the code could be reused. 

The HTML [5] designed web user interface consists of a page containing a form 
where the user enters the information necessary, words, ticks, etc. to define the 
knowledge he is looking for and a page to show the results. On the other hand, the 
communication between the search engine and the web pages has been developed 
using the Common Gateway Interface (CGI) [6]. 

An example of the results obtained after a search in the system is detailed in Lig. 3. 




Gas Circulator Design Advisory System 



165 



SEARCH DETAILS 

Station Hunterston 

Plant Item Anti-ReverseClutch 

Job Type Design 

KNOWLEDGE BASE RESULT 

Teehnieal Interest 
Meehanical Design 

Plant Affeeted 
Gas Cireulator 
Motor 

Comments The anti-reverse eluteh prevents the motor shaft from rotating in a single direetion. 

CASE BASE RESULT 

Case Deseription 
General Design Adviee 
Stietion Measurement 
Long Term Review 

Referenee 

None 

DOCUMENT SEARCH RESULT 

Referenees 

4/013 ADM 13 - LSE - 'Steiber Cluteh Cover' 

23/005 NNC C33/C34/PSD/234/049 IssB Appendix II "Cireulator Running Speed in Event of Quadrant Trip and Anti- 

Reverse Cluteh Failure" 



Fig. 3. Example of a result seareh in GCDAS 

In accordance to what has been said the overall structure of the GCDAS 
implementation is shown in Fig. 4. 



3 Application Building 

The GCDAS system was created by a multi-disciplinary a group of researchers. The 
team was composed of four people: one of them was an expert in knowledge 
elicitation, another was an experienced user interface developer and web pages 
constructor, another member of the group was a computer scientist and the last one 
collaborated on software development issues. 

Using the computer engineer full time dedicated to the project and the rest of the 
team on a part time basis, the project lasted around sixteen months. This time scale 
began at the start of the elicitation process until the installation in the company and 
training of the users. 

The system was installed on a server machine in the company engineers and has 
been used since then by new staff as well as experienced engineers, who were able to 
give system design feedback. The training required was very little because all of the 
final users had experience concerning the use of web pages. Consequently, the only 
new skill they had to be taught was what to expect from the searches and how they 
could make these searches more accurate and relevant with respect to the information 
they were interested in finding. 



166 J. Menal et al. 




Fig. 4. GCDAS Implementation 

On the other hand, as new design work is constantly undertaken, new case studies 
and documents are generated and then the knowledge base would need to be 
modified. At present, updates to the document database are controlled by company 
procedures. The maintenance of the case base and knowledge base will require similar 
procedures, potentially involving further knowledge elicitation and validation 
sessions, to ensure that new cases and knowledge added to the system are known to 
be correct. For this reason GCDAS has no automatic learning facility. 



4 Application Benefits 

This paper has demonstrated the integration of case based reasoning, knowledge 
based systems and database techniques using a web based approach to providing 
design engineers with useful design information. A structured approach to acquiring 
the knowledge required for the realisation of GCDAS has been outlined. This 
approach could be used in the extension and maintenance of the system. 

The main benefit obtained by the development of the GCDAS has been to be able 
to demonstrate the ability of compiling expert knowledge in a particular subject and to 
structure it in such a way that can be easily accessed by other experts or by people in 
training. It has been demonstrated for a particular case study in the nuclear industry 
but it can be used in any other type of business for similar purposes. 

Some further plans to improve the system have already been pointed out: 

• Extension of GCDAS: the methodologies adopted for GCDAS could be applied to 
extend the system to consider further power stations, plant and nuclear systems. 
The design and implementation of the extended system would preserve the 
structure of the existing GCDAS. 




Gas Circulator Design Advisory System 1 67 



• Web Based Knowledge Servers for Plant Management: GCDAS considers design 
modifications to Gas Circulator plant; other plant management issues that could 
be addressed by this approach include monitoring, diagnostics, maintenance 
management and performance monitoring. The realisation of such systems 
involves combining the output from a range of sources including intelligent 
systems. Furthermore, the integration of real-time data sources is an issue in 
constructing such systems. Therefore, strategies for managing the presentation of 
information which address the context of use, for example, and issues of safety 
criticality of user competence, would have to be built into the knowledge server. 

• Knowledge Maintenance: a long-term issue in web based knowledge servers is 
the requirement of keeping the knowledge utilised by these servers up to date 
with the current state of knowledge within the organisation. It is likely that web 
based knowledge servers will be most successful as part of an effective corporate 
knowledge management system. 



Acknowledgement 

The authors would like to acknowledge the support of Scottish Nuclear Ltd. (UK) 
personnel in the conduct of this work. 



References 

1. Schmidt, J.W., Thanos, C., Foundations of Knowledge Base Management: 
Contributions from Logic, Databases, and Artificial Intelligence Applications, 
1989 

2. Frost, R.A., Introduction to Knowledge Base Systems, Collins Professional and 
Technical, 1986 

3. Wielinga B.J., Schrieber A. Th., Breuker J. A., KADS: A Modelling Approach 
to Knowledge Engineering, Knowledge Acquisition, 4 (1), 1992, pp 5-53 

4. Wright, C., Java, Teach Yourself Books, 1998 

5. Lampton, C., Home Page: an Introduction to Web Page Design, New York: 
Franklin Watts, 1997 

6. Gundavaran, S., CGI Programming on the World Wide Web, O’Reilly & 
Associates, 1996 




Expert Systems and Mathematical Optimization 
Approaches on Physical Layout Optimization Problems 



Julio C. G. Pimentel Yosef Gavriel and Eber A. Schmitz ^ 



^ Dept, of Elect. & Comp. Eng. Laval University, Ste-Foy, PQ, Canada G1K7P4 
pimentel@ieee . org 

2 Dept, of ECE, Virginia Tech, 610 N. Main St. Blacksburg PMB288, VA 24060, USA 
yosef gavriel@computer . org 

^ NCE, Federal University of Rio de Janeiro, UFRJ, Ilha do Fundao, RJ, Brazil 



Abstract. This work presents a new approach to the problem of component 
placement on printed circuit boards. It describes a system for automatic 
placement (SAP) on printed circuit boards (PCB) that uses both artificial 
intelligence techniques (expert systems) and classical optimizing algorithms. 
The previous approaches model the placement problem as a classical 
optimizing problem and they do not take into account the intrinsic circuit 
features. This work researches the use of empirical knowledge acquired from 
PCB designers and classical algorithmic techniques to improve the placement 
algorithm performance and final art result. Starting from component and net 
lists, SAP identifies and classifies groups of these objects, which are important 
to the problem domain. Afterwards, it uses a rule base to find the relative 
placement between components. Finally, the relative placement is optimized to 
minimize the total wire length and to equalize the distribution of wires on the 
board. 



1 Introduction 

The problem of automatic placement of electronic components in printed circuit 
boards (PCB) has been treated so far as a classic problem of nonlinear optimization. 
The widely adopted strategy consists of transforming the placement problem into a 
problem of quadratic association and solving this new problem using one of the many 
procedural optimization methods available in the literature [5,7,9]. This strategy in 
general does not take in consideration the circuit functional characteristics and 
therefore they have not been able find the best placement of certain types of boards, in 
special for the case of bus structured digital PCBs. However, more than seventy 
percent of all PCBs manufactured presently corresponds to this type of board. Some 
previous works considered the use of techniques of artificial intelligence in the 
solution of automatic placement problems in order to include the circuit functional 
characteristics and the strategies acquired by specialists in PCB layout throughout 
several years of work experience in the problem modeling [9]. These approaches 



Also known as J.C. Yosef Gavriel Ben Abraham Tirat-Gefen De Souza Batista 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 168-173, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




Optimization Approaches on Physical Layout Optimization Problems 169 



produce in general better solutions than the ones achieved by classic methods but at a 
much higher computational cost and they can only be practically used for small 
complexity circuits [1,4]. This work investigates the use of empirical knowledge used 
by PCB designers, along with classic algorithms, aiming to direct the search of the 
solution and to improve the placement quality without increasing unnecessarily the 
processing time. 

The placement problem consists of mapping the components into the PCB area 
minimizing some cost function. This function must be defined in such a way that the 
final layout satisfies some restrictions: Generally, the routability of the PCB must be 
maximized and the layout must occupy the least possible area in order to decrease the 
cost of the board. However, in the majority of the practical cases, it is impossible to 
take care of all those restrictions at the same time. Therefore, a function of cost that 
allows minimizing some criteria, respecting other constraints within acceptable limits 
must be properly chosen. The total wire length minimization is often accepted as a 
criterion that satisfies partially these objectives. 



2 Description of the Proposed System 

PCB designers use an ad-hoc strategy that can be summarized as follows: The 
specialists initially partition the circuits in functionally independent modules, taking 
in consideration the structure of the buses, and the functionality of the modules; They 
determine the relative placement of the modules in the PCB based on the function of 
each block, the position and the number of external connectors, and other conditions; 
They do the same to the placement components belonging to modules; They increase 
the board routability by optimizing the relative placement between modules in order 
to minimize the total wire length. Similar to human designers, SAP [2] uses a three 
phase placement strategy: partitioning, relative placement and optimization. In our 
system, these phases are implemented using empirical rules to represent the specialist 
knowledge. In the situations where it was not possible to get rules (or because they do 
not exist, or it is too difficult to identify them), and in some cases, for efficiency 
reason, SAP uses classic optimization algorithms. We might notice that we may 
decrease the number of components treated by algorithms by inserting new rules into 
the knowledge base. The last phase is to optimize the final placement to decrease the 
total wire length and to balance its distribution over the board thus preventing 
congestion. 



3 System Implementation 

Figure 1 shows SAP architecture. Each one of the phases described above is 
mapped to a program module. The phases of fusion, ordering and relative placement 
are grouped in the program module called ALEX, which corresponds to the intelligent 
part of the system. These phases contain a set of rules that represent the knowledge of 
the layout specialist, and algorithms to decide the situations where they had not been 
possible to identify rules. The overlapping elimination, component placement and 
optimization phases are entirely algorithmic. The database used in the system is 
essential in allowing capture of design information. 




170 Julio C.G. Pimentel et al. 




Fig. 1. SAP Architecture 

The SAP data model is implemented on a database supporting both the algorithmic 
and rule-based parts of SAP. Examples of used objects on the SAP data model are 
listed below. The design of SAP’s database followed an object-oriented 
approach [3,8]. During the analysis process the following relationships were 
identified: belongs-to serves to associate some objects such as: components and 
modules; nets and buses; pins and packages; is-of-type associates components and 
devices, components and packages; is-connected-to associates a pin of a component 
to the net that is on in it; one-to-many (1:M), as in buses owns nets, module 
components; many-to-many (M:N), as in modules are on the buses, components are 
on nets. Some nets are also part of buses, so it is important to know to which bus a 
particular net belongs. Many-to-many (M:N) relationships are stored in correlation 
tables in the SAP data model such as TBM and the connection table. The objects 
identified in the conceptual model of the database [2] are grouped according to its 
nature. The first group is the system library and includes the electronic definition of 
the packages, pins and devices. Those objects are stored in a static database and 
represent general information that can be used by all PCBs. The second group 
captures specific information for the PCB being currently designed. The objects are 
mapped as PROLOG facts in the following way: The names of the objects and 
relationships correspond to predicates; The attributes of these objects and 
relationships are arguments to the facts. The facts are used by the empiric rules 
acquired from the experts. 



3.1 The Expert System for Allocation - ALEX 



The first phase in the process of placement is to determine the relative positions of 
the components connected to buses. The program module ALEX, responsible for this 
task, understands rules and algorithms that take in consideration the structure of the 
buses, the functional characteristics of the components and the topology of the 
external connectors. In ALEX, the empirical knowledge acquired of the layout 
specialist is stored in the form of production rules since these are adequate to 






Optimization Approaches on Physical Layout Optimization Problems 171 



represent expert knowledge on PCB design, and it is easy to add, change and remove 
data and rules compared to procedural approaches. The intelligent part was 
implemented in TURBO PROLOG. The knowledge base acquired from the layout 
specialists is stored on the form clauses (facts and rules) while the inference engine is 
embedded in the PROLOG compiler. The diverse tasks carried through ALEX are 
distributed by four expert systems in charge of proposing solutions within the domain 
of knowledge. Each expert system deals with only some aspects of the problem. The 
relative placement of the components of the sub-circuits connected to the buses is 
achieved by activating each expert in sequence. 

3.1.1 The Expert for Pre-processing (GRAVA) 

The main function of the pre-processor GRAVA is to read the input data that 
describes the system and to construct the database that will be used by all the other 
modules of the SAP. The input data includes the configuration file that contains 
information related to the printed circuit board, such as useful area, a list of fixed 
components (normally connectors), and the list of special nets (e.g., Vcc and Gnd). 
The current implementation of SAP recognizes only the supply nets (Vcc and Gnd); 
The net file that describes the topology of the circuit and contains the lists of nets and 
components; And the system library that contains the physical description of the 
packages (ex.: DIP14, 14 pins, area (-127,-127,1651,889), Pin position) and the functional 
type of each electronic device. GRAVA gathers all the information that will be used 
by the other expert systems. Initially, it searches the system library for the device type 
and package of each component. Afterwards, the list of nets is searched and existing 
circuit buses are identified and classified. 

3.1.2 The Expert for Module Fusion (REM) 

The REM expert groups two or more components that are strongly connected 
among themselves. They are grouped in modules that are treated as a unit. To group 
the components, the REM takes in consideration some information such as the 
functional type of the module, its form, the size of the printed circuit board, and the 
size and forms of the components. In this way, it can be guaranteed that all the 
modules can be placed in the board. Moreover, it is obvious that the bus structure 
takes an important role in the placement. Figure 2 shows some statistical results for 
two typical bus structured PCBs which shows that most of the components are 
connected to small buses and therefore they must be placed in the same region of the 
board. We have used a length equal to 10 to separate a global bus from a local bus. 

3.1.3 The Expert for Module Ordering (ROM) 

ROM receives the modules generated by REM and orders the components inside 
each module considering the component type and how the component pins are 
hardwired to each bus. ROM contains rules to order memory banks, and modules 
connected to the primary buses. The memory banks are ordered by a rule that 
recognizes the module type and activate an algorithm responsible for constructing the 
memory matrix. The modules connected to a primary bus are ordered by the method 




172 Julio C.G. Pimentel et al. 



summarized here. The structure of the connections between buses and components is 
mapped in a tree where the root contains the components that have more connection 
with the buses while the branches correspond to the components that have fewer 
connections with the buses. 



100% 



de 

Pinos 




2 5 10 



a) “Placa Microprocessada” 

b) “Palca de Memoria” 




Fig. 2 Statistical results 



3.1.4 The Expert for Placement (RPM) 

This specialist is responsible for determining the relative position of the modules in 
the board. To execute its task, RPM considers some characteristics of the board and 
the module, such as the size, the form factor of the module and the position of the 
board connectors. RPM determines the relative position of the modules connected to 
buses by first calculating the center of gravity CGc of the connector pins connected to 
the module. Afterwards, RPM calculates the center of gravity CGm of the module, 
and places the module adjacently to the connector in such a way that CGm is lined up 
with CGc. Figure 3 exemplifies this strategy for a bus. 



4 Conclusion 

This work presented the development of an automatic placement system that uses 
techniques of artificial intelligence coupled to optimization algorithms to place 
electronic components in printed circuit boards. The methodology used for modeling 
the data showed to be of great aid to the analysis and specification of the system 
databases. The tests realized so far show that the results achieved by using the 
prototype are better than ones with classic techniques consisting of using placement 
by force algorithms [5,6,7] followed by overlapping elimination. Moreover, turning- 
off the expert system on the module ALEX deteriorates significantly the final quality 
of the layout, what allows us to conclude that the existing rules currently represents 
the human specialist knowledge. 




Optimization Approaches on Physical Layout Optimization Problems 173 




References 

[1] ODAWARA, G., IIJIMA, K., WAKABAYASHI, K., ’’Knowledge-Based 
Placement Technique for Printed Wiring Boards”, 22a Design Automation Conf. 
Proc., 1985, pp 616-622. 

[2] PIMENTEL, J. C. G., “Sistema Baseado cm Regras Para Posicionamento de 
Componentes Eletronicos”, Disserta9ao de Mestrado, Universidade Federal do 
Rio de Janeiro, Abril de 1990. 

[3] SHLAER, S., MELLOR, S. J., ’’Object-Oriented Systems Analysis: modeling 
the world in data”, Prentice-Hall, 1988. 

[4] STEINBERG, L., ’’The Backboard Wiring Problem: a placement algorithm”, 
SIAM Rev., Vol. 3, No 1, 1961, pp 37-50. 

[5] FORBES, R., ’’Heuristic Acceleration of Force-Directed Placement”, 24a DAC. 
Proceedings, 1987, pp 735-740. 

[6] GOTO, S., KUH, E. S., ”An Approach to the Two-Dimensional Placement 
Problem in Circuit Layout”, IEEE Trans, on Circuits and Systems, Vol. 25, No 
4, 1978, pp 208-214. 

[7] HANAN, M., KURTZBERG, J. M., ”A Review of the Placement and Quadratic 
Assignment Problems”, SIAM Review, Vol. 14, No 2, 1972, pp 324-342. 

[8] HARMON, P., KING, D., ’’Expert Systems: Artificial Intelligence in Business”, 
John Wiley and sons, Inc. , 1985. 

[9] LIN, S., GAJSKI, D. D., ”LES: A Layout Expert System”, 24a Design 
Automation Conf Proc., 1987, pp 672-679. 





Locating Bugs in Java Programs — First Results 
of the Java Diagnosis Experiments Project* 



Cristinel Mateis, Markus Stumptner, and Franz Wotawa 



Technische Universitat Wien, Inst it ut fiir Infer mat ionssyst erne 
Database and Artificial Intelligence Group 
FavoritenstraBe 9-11, A- 1040 Wien, Austria 
{mateis ,mst ,wotawa}@dbai .tuwien. ac . at 



Abstract. This paper describes the use of model-based reasoning for 
locating bugs in Java programs. Model-based diagnosis is a technology 
that uses a declarative, generic description of the behavior of the compo- 
nents occurring in a domain to construct a model of the overall system 
which can then be used at the desired level of abstraction to predict a 
system’s behavior and derive assumptions about which parts of the sys- 
tem are incorrect. This approach is particularly enticing when applied to 
software since the model can be constructed from the program automat- 
ically. However, the actual choice of models poses interesting challenges. 
We show a simple model based on dependencies that can be used to 
diagnose very large programs, and walk through an example debugging 
session. 



1 Introduction 

Debugging, i.e. locating and correcting faults in programs, is generally assumed 
to be a difficult task. This holds especially in the case where a programmer 
needs to debug a program written by a different person. One reason is the size of 
the search space for debugging and the implicit connection between parts of the 
specification (i.e., the intended behavior of the program), and parts of the pro- 
gram (i.e., the realization). These implicit links are usually manifested as part of 
the programmer’s mental model. In order to overcome the complexity problem 
and to eliminate large portions of the program from the focus of interest, pro- 
grammers use slices [16]. Slices are program fragments representing statements 
that influence the value of a given variable. They can be automatically derived 
from the syntactical structure of a program at compile time or runtime. Other 
approaches to software debugging use logical models of the program [13,5,11], or 
dependencies between variables or the flow of control [8,6]. Apart from general 
program debugging, other researchers present approaches for dedicated tutoring 
systems [7,10]. All of the above research directions of program debugging have 

This project was partially supported by the Austrian Science Fund (FWF) under 

project grant N Z 29-INF. Cristinel Mateis was funded by the FWF under grant 

P12344-INF. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 174-183, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Locating Bugs in Java Programs 175 



in common that they rely on specific algorithms and modeling paradigms. Some 
of them are limited to a programming language and some require an additional 
specification (causing additional learning and writing effort). 

Apart from ‘traditional’ debugging research, in the past years several at- 
tempts for using model-based diagnosis (MBD) [12,3] for locating faults have 
been published. Some of them [2,1] show how MBD can be used for debugging 
Prolog-like programs. They compare the outcome of the MBD approach with 
Shapiro’s Algorithmic Debugging [13] and claim that MBD requires less user 
interaction for locating the fault. Other papers [4,17,14] show how MBD can be 
used for debugging hardware description languages (using VHDL as example) 
and functional programming languages. Some of the ideas described therein can 
be used for almost all programming languages including procedural languages. 
To use MBD principles for software debugging a model must be available. A 
model or system description SD in the terminology of MBD describes the be- 
havior of system components COMP and the structure of the system to be 
diagnosed. In model-based software debugging a model is the logical description 
of the potentially faulty piece of software, e.g., the set of statements and expres- 
sions used in the program. In addition to the model, a set of observations OBS 
of the system behavior is required. Then the model together with the observa- 
tions is used directly for identifying the faulty parts. A basic advantage of the 
approach is that only the correct behavior of components must be known. 

The necessary logical description, i.e., the model, of a program can be derived 
from the program and the programming language. The specification of a pro- 
gramming language describes the semantics of a program. The semantics are not 
given individually for every program, instead the behavior of the different lan- 
guage constructs is defined, and these constructs, e.g., conditional statements, 
expressions, or operators, can then be mapped to diagnosis components. For 
each of them, a logical model fragment is built. These model fragments together 
with the syntax of a concrete program allow to automatically derive a logical de- 
scription for the program at compile time. Once a model is available, test cases, 
e.g., method calls with given argument values, can be used as observations in 
the search for a bug. The actual diagnosis step uses a standard model-based 
diagnosis algorithm, which returns diagnoses, i.e., sets of components that may 
be faulty. In our case these correspond to the above program constructs, i.e., the 
diagnosis maps back directly to the source code of the program. 

The purpose of this paper is to give an overview of how Java programs can 
be modeled for model-based diagnosis. Java is an object-oriented programming 
language. It is strongly typed and has a C-like syntax. A Java program is a collec- 
tion of classes, and each class has a (possibly empty) set of instance variables and 
methods used to implement the desired behavior of the class. For our purposes 
we assume that the programs to be debugged can be compiled, i.e., they must 
be syntactically correct and pass the semantics analyses. The methods described 
in this paper are for locating functional faults detected at runtime. Functional 
faults occur whenever the program terminates with a wrong variable value. In 
addition, we assume that the correct program is a small variant of the buggy 



176 Cristinel Mateis et al. 



program. Faulty programs requiring substantial changes, e.g., new instance vari- 
ables or classes, are out of the scope of this paper. The paper aims in providing 
a model for locating a bug in a method of a Java class. Since methods are similar 
to procedures or functions, the results of previous research [4,17,15,14] can be 
used. However, there are some differences. First, in Java the methods to be called 
need not to be known at compile- time. This behavior is called late binding. The 
second difference is that in Java global variables, including class variables, are 
used. 

In the rest of this paper we present an abstract model which can be used for 
debugging Java programs where statements are mapped to diagnosis components 
and variables are mapped to the connections between the diagnosis components. 
As a result, the granularity of the diagnosis result is set to the statement level. 
So statements can be classified as being correct or faulty. A finer classification, 
such as identifying individual expressions inside statement as the source of a 
bug, is not possible with the model. However, because of this choice our model 
allows fast debugging of larger programs. 



2 The Dependency Model 

In this section we describe how to convert a given program into a dependency 
model by statically analyzing the statements of the methods appearing in the 
classes of that program. The conversion of a program is performed at compile 
time, i.e., its output does not depend on values which arise at run time, in two 
steps. First the program is compiled to a functional dependency representation, 
and afterwards the resulting dependencies are mapped to a logical model used 
for debugging. 



2.1 Defining Functional Dependencies 

A Java method m may contain two kinds of variables: simple variables (i.e., 
variables whose values are of simple type, e.g., integer, boolean, etc.) and vari- 
ables of reference type (i.e., variables whose values are not of simple type, e.g., 
objects and arrays). The variables of m comprise the variables locally declared 
in m, the fields of the class owning the method m, the formal parameters of m, 
and the fields of the variables of reference type occurring in m. Because of the 
phenomenon of dynamic binding of the variables of reference type, i.e., that 
the same variable may point to different objects in the memory during differ- 
ent points in the execution of the same call of m, we assign auxiliary locations 
to variables of reference type. These locations are denoted by order numbers 
which correspond to the locations created in memory at run time whenever a 
constructor-invocation statement is encountered. When a new object is created 
by a constructor invocation, the variables arising from the fields of the new ob- 
ject are bound to the location referred by that object, i.e., the 1- value of the 



Locating Bugs in Java Programs 177 



object, and not to the value of the object, i.e., the r- value of the object^. In 
this way we can properly keep trace of the variables arising from the fields of 
the objects which can be accessed through different variables of reference type 
pointing to the same memory location (i.e., aliasing phenomenon). On the other 
hand, due to selection statements, the location assigned to a variable of reference 
type might not be known with precision at a given point of the method at com- 
pile time and there might be several locations, only one of which corresponds 
to the memory location assigned at run time to that variable. However, we as- 
sume static analysis and therefore, we take into consideration all these possible 
locations. 

A variable is represented in our model either by (i) its name (e.g., v) if it 
is a formal parameter or a local variable declared in the method, or (ii) by its 
name preceded by the number of a location (e.g., 4 :: t’) if it is the field of an 
object. In the following, whenever we talk about variables, we imply variables of 
one of the forms (i) and (ii). We have to consider that the variables occurring 
in the methods change their values during program execution. This is handled 
by assigning a unique index to all occasions where a variable occurs as target of 
an assignment. That is, a variable v occurring in a method m may change its 
value several times during the execution of the same call of m and each time, the 
occurrence of v is assigned a different index. It is the various variable occurrences 
that dependencies are computed for. 

Let X be an indexed variable occurrence of m and y either an indexed variable 
occurrence or a constant of m. We say that x depends on 7/ iff a change of the 
value of y results in a change of the value x. In other words, x depends on y 
iff there is an execution e\ of m such that if at some point t during ei we alter 
the value of y (thus producing a new execution, 62), then there is a later point t' 
such that the value of x in ei is different from its value in 62- (Note that if 
conditions in m were dependent on the value of the code executed between t 
and t' may actually be different in ei w.r.t. 62 since they may branch differently.) 
We say that x depends on a method n iff there is a methodcall of n in m s.t. 
either (i) x is modified by side effects of that methodcall of n, or (ii) x depends 
on the return value of that methodcall of n. Formally, a functional dependency 
is a pair (a:, M^), where x is a variable occurrence and is a set of variable 
occurrences, constants, and methods, such that x depends on every y G 

As mentioned previously, our dependency model is for debugging on the 
statement level. Therefore, we are interested in computing the dependencies 
for statements. The set of all dependencies for all statements can be viewed 
as graph. Statements are nodes, and the connections are formed by the vari- 
able occurrences inside the statements, i.e., statements ci and Cj are connected 
iff one statement establishes the functional dependency {vi^M) and the other, 
(iCj, {. . . , . . .}). Variable occurrences v^) are inputs of the graph. A variable oc- 

currence Vi is an output iff there is no other occurrence Vj such that j > i. Since 

^ In Java, every variable has an r- value which is defined as the actual value stored in 
the variable, and it also has an 1- value which is defined as the name of the variable 
associated to the location in the memory where the r-value is stored. 



178 Cristinel Mateis et al. 



during conversion indices are always increased, the resulting graph representa- 
tion is acyclic. No feedback loops occur in the system. This is different from the 
abstract dependency model in the domain of debugging concurrent programming 
languages [4], where explicit handling of feedback cycles is required. 



2.2 Method Conversion 

A method m of a Java program is converted by sequentially converting its state- 
ments into diagnosis components. Before starting the conversion of the state- 
ments of the method into diagnosis components, default locations are created 
for the formal parameters of reference type and for the helds of reference type 
deriving from the formal parameters. The object for which the method m is 
supposed to be called^ we assign the default location with the order index 0, i.e., 
those fields u of o which are accessed in m are represented as 0::u in our model. 

Basically, a diagnosis component of m refers to a statement of m and contains 
the functional dependencies related to the occurrences of the variables which are 
modified in that statement. Note that from a single statement more functional 
dependencies may arise due to the fact that more variables may be modified by 
side effects. Consider for instance a method call which occurs in a mathematical 
expression whose ultimate value is assigned to a variable. Such a call could, 
apart from returning a numerical value, modify instance variables of some of 
its actual parameters of reference type. In this paper we consider 4 kinds of 
statements (assignments, selections, iterations, and method calls) and sketch 
how they are converted into a set of dependencies. See [9] for a more detailed 
technical description of the conversion. 

Assignment Statement An assignment statement consists of a left-hand 
side which always leads to a variable and a right-hand side which is an expres- 
sion. An expression may be a constant, a variable, a method call, or an operation 
whose operands are expressions. The assignment statement is converted into a 
diagnosis component containing the dependency of the variable occurrence cor- 
responding to the left-hand side on the constants, variable occurrences, and 
methods appearing in the right-hand side. In addition, other functional depen- 
dencies, corresponding to variable occurrences visible in m and modified by side 
effects of method calls appearing in the assignment statement, on constants, 
methods, and variable occurrences visible in m may arise. 

Selection Statement A selection statement consists of a boolean selection 
expression and two sets of statements corresponding to the “then” and “else” 
branches, respectively. First the diagnosis components due to side effects of the 
boolean selection expression are computed. Then the statements of each branch 
are converted into the corresponding sets of diagnosis components which are 
stored in the two appropriate instance variables of the diagnosis component 
corresponding to the selection statement. The functional dependencies of the 

^ Remember that the analysis is static and we do not actually know the object o for 
which the method m is invoked, therefore we have to use a generic placeholder object 
for the purpose. 



Locating Bugs in Java Programs 179 



diagnosis component corresponding to the selection statement are obtained by 
summarizing the functional dependencies of the diagnosis components of the two 
branches corresponding to the variables visible outside the selection statement. 
To this set are added the dependencies on the operands of the boolean selection 
expression. 

Iteration Statement For simplicity and without loss of generality, here 
we consider only “while” iteration statements. A “while” statement consists of 
an exit boolean expression and a body which is a set of statements executed at 
each iteration. A loop can be viewed as a nesting of identical selection statements 
having an empty “else” branch and the same boolean expression and body of 
the “then” branch as the loop. We start by converting the outermost selection 
statement and by increasing the depth of the nesting until the addition of a new 
selection statement induces no new functional dependencies for variables visi- 
ble outside the iteration statement. The diagnosis component corresponding to 
the iteration statement is the diagnosis component computed for the outermost 
selection statement of the equivalent nesting. 

Method Call Statement In order to convert a method call statement call- 
ing a method n, the set of dependencies of n is needed. The functional depen- 
dencies of the variables visible outside n (e.g., fields of the formal parameters of 
reference type, fields of the object owning n) are summarized and, after having 
replaced the formal parameters by the actual parameters, side-effect diagnosis 
components containing these functional dependencies are created. If n returns 
a value (i.e., n does not have return type “void”), the summarized functional 
dependency corresponding to the return statements of n is used for the con- 
version of the statement where the method call of n occurred. Above, we have 
supposed that the dependencies of a method n are available before a method 
call of n is encountered in the body of a method m. Unfortunately, this is not 
the case in general. In case of mutually recursive functions, e.g., m calling n 
and n calling m, an infinite sequence of alternative calls to the procedures for 
computing the dependencies for n and m would arise. Therefore, a fixpoint com- 
putation is adopted. The fixpoint computation algorithm for each group S of 
mutually recurseive methods, starts by converting an arbitrary method m of S. 
The method m may have been chosen either (i) because it has been called in the 
body of a method not belonging to S before the set S has been processed, or (ii) 
because it was the first method stored in S. An initial dependency set Mq of the 
methods m from S is computed by ignoring the block of statements in the body 
of m containing calls to methods from S. In the i-th iteration of the fixpoint 
computation, in order to compute the i-th dependency set Mi for a method m 
from 5, the dependencies Ni-i of the methods n from S computed at the previ- 
ous iteration are used. The fixpoint is reached when the dependency sets of all 
methods of S computed in the current iteration contain no new dependencies. 
Since the program has a finite number of methods and every method of the 
program has a finite number of statements and since a dependency cannot be 
alternatively validated and invalidated at different iterations, it is clear that the 
fixpoint exists and it is reached in a finite amount of time. 



180 Cristinel Mateis et al. 



1. class SWExamples { 

2. public static void test(int a,b,c,d,e) { 

3. int f,g,sl,s2,s3; 

4. sl=a*c; 

5. s2=b*d; 

6. s3=c*e; 

7. f=sl+s2; 

8. g=s2+s3; 

9. } 

10 } 

(a) Source code 



Line Environment 




2. 


test(3,2,2,3,3) 
Q-test — 3, 


= 2, Ctest = 2, 




dtest — ‘^•)^test 


= 3 


3. 

4. 


-Sltest — 0 




5. 


s‘^test — 0 




6. 


S^test — 0 




7. 


ftest = 12 




8. 


9test — 12 




9. 

10. 


test{3, 2, 2, 3, 3) 


= void 



(b) Evaluation Trace for test (3, 2, 2, 3, 3) 



Fig. 1. A simple Java method 



2.3 Mapping Functional Dependencies to a Logical Model 

In the previous section we describe how the functional dependencies can be 
derived using only the syntax of a given Java program. In this section we take 
the dependencies and show the resulting logical model. We assume that the 
statements are given as a set COMP, and that for all statements functional 
dependencies are defined. The function fd is used for accessing dependencies. 
Functional dependencies describe behavior implicitly by describing influences 
between variables. Instead of speaking about real values, we only can speak 
about whether a value v is correct (written as ok{v)) or not (written nok{v)). 
We further can write that if a statement s, i.e., a diagnosis component, behaves 
correctly (i.e., ^AB{s) holds) and all input variables have a correct value then 
the value of variables used as target in an assignment statement must be correct. 
Formally, the system description is given by: 

V(o,M)e/d(C) [^AP(C) A ^ ok{o)] e SD where C G COMP is a 

statement. In addition, we know that it is impossible that any variable value 
is known to be correct and incorrect at the same time. Therefore, we have to 
add the rule ok(y) Anok{v) ^ T to the model where v denotes a variable 
(with index) used in the program. The above model can be used for computing 
bug locations. However, since the model does not allow back propagation, the 
standard measurement selection algorithm [3] sometimes may not be able to 
provide a meaningful distinction between variable values that the user is to 
be queried for. Therefore, we introduce additional rules to avoid this problem. 
Since, ok and nok are semantically too strong we use new predicates pok and 
pnok to say that a value is possibly ok and possibly nok^ respectively. The new 
predicates are in weaker in the sense that the pok and pnok facts do no lead to 
a contradiction. The additional rules are: 

V (o,M)efd(C) [~^AB{C) Apok{o) AxeMPok{x)] inSD 
{o,M)efd{C) [-^AB{C) Apnok{o) pnok{x)] inSD 

As far as the measurement selection algorithm is concerned, we place diagnoses 
leading to ok and pok (and nok and pnok) together in one class. 

Apart from the system description, the MBD approach requires a set of 
observations OBS. For software debugging, the observations are given by the 



Locating Bugs in Java Programs 181 



specified behavior, in our case the expected input/output vectors. By comparing 
the specified output with the computed output, we can classify the correctness 
of variables. Variables v that are assumed to have the correct value lead to the 
observation ok{y) Apok{v). Variables with an incorrect value are represented by 
nok{v) Apnok{v). 

3 Debugging with the Dependency Model 

In this section we show how the functional dependency model can be used to 
locate the fault. We use the Java program given in Figure 1 as our running 
example. Figure 1 shows the evaluation trace for the call test(3,2,2,3,3). The 
trace only presents the lines of code which are involved in the current evaluation, 
and the new environments created. To distinguish different local variables they 
are indexed with the name of the method where they are declared. The return 
value of method is also depicted, e.g., test(3, 2, 2, 3, 3) = void. Most debuggers 
in use nowadays guide the user through an evaluation trace. If an error occurs 
early in the trace this approach can be effectively used. However, otherwise 
such an approach leads to the examination of unnecessary values and maybe 
ignoring wrong values. Therefore, debuggers allow the user to set break points. 
Obviously, this helps is some cases but choosing the right location is not easy 
requiring an experienced user. In addition, for recursive functions a break point 
may be passed quite often during program execution in situations where the 
state of the program does not tell us anything about the fault. 

The functional dependency model can be used to avoid the problems of a 
traditional debugger. Necessary for our approach is the availability of the source 
code, information about the expected outcome of a program, and the evaluation 
trace up to the detection of a misbehavior. From the source code the functional 
dependency model can be automatically derived. From this model and the mis- 
behavior we can compute those statements which may be responsible for the 
wrong values. We illustrate our approach using the small example from Figure 1. 
The functional dependencies for the five statements (ignoring indices) are given 
by: FD{Ci) = {(si, {a, c})}, FDiC^) = {{s2, {b,d})}, FD{C^) = {(s3, (c, e})}, 
FD[Cj) = {(/, {si, s2})}, FD[Cs) = {{g-> {s2, s3})} where Ci denotes the state- 
ment in line i. From the dependencies, we get the following model SD: 

-^AB{CP) A ok{a) A ok{c) o/c(sl) ^AB{CP) A pnok{sl) {pnok{a) A pnok{c)) 

^AB{CA Apok{sl) {pok{a) Apok{c)) ^AB{C^) A ok{h) A ok{d) ok{s2) 

^AB{CA) Apnok{s2) {pnok{b) Apnok{d)) ^AB{CA) Apok{s2) {pok{b) Apok{d)) 
^AB{Cq) A ok{c) A ok{e) ok{s3) -^AB{Cq) Apnok{s3) {pnok{c) Apnok{e)) 

^AB{Cq) Apok{s3) {pok{c) Apok{e)) -^AB^Cj) A ok{sl) A ok{s2) ok{f) 

-^AB{CA Apnok(f) (pnok{sl) Apnok{s2)) ^AB{Cj) Apok{f) {pok{sl) Apok{s2)) 
^AB{Cs) A ok{s2) A ok{s3) ok{g) ^AB{Cs) Apnok{g) {pnok{s2) A pnok{s3)) 

-^AB{Cs) Apok{g) (pok{s2) A pok{s3)) 

ok{a) A nok{a) ± ok{b) A nok{b) _L ok{c) A nok{c) _L ok{d) A nok{d) ± 

ok{e) Anok{e) ± ok{f) Anok{f) _L ok{g) Anok{g) _L o/c(sl) Anok{sl) _L 

ok{s2) Anok{s2) _L ok{s3) Anok{s3) ± 

In this example we assume that the method call test(3,2,2,3,3) should lead to 
values f=12 and g=0^ i.e., the line 8 should be g=s2-s3 instead of g=s2+s3. For 



182 



Cristinel Mateis et al. 



this case we get observations OBS : ok (a) A ok{b) A ok{c) A ok{d) A ok{e) A ok{f) 
Anok{g) Apok{f) Apnok{g). Using SD U OBS we get 3 diagnoses, each pin- 
pointing a single possible bug location: {C 5 }, {Cq}^ The other statements 

can be ignored in this case. This initial result is similar to the one received by 
other techniques [16]. Using the measurement selection algorithm from [3] we 
can compute the optimal next question to be ask the user in order to distinguish 
between the 3 candidates. The algorithm is based on minimizing the entropy and 
requires the existence of fault probabilities. For our example we assume that the 
all statements are equally likely to fail with probability 0.1. Using only the 3 
diagnoses and computing possible values we get the following results including 
the entropy value HE for the variables: 



Variable 


ok{V)Vpok{V) 


nok{V) y pnokiy) 


HE 


si 


C,,Ce,Cs 




-0.31993 


s 2 


C,,Ce,Cs 


C 5 , Cq 


-0.58642 


s3 


C5,Ca 


Cz,Ca 


-0.53298 



Note that the values ok/pok (and nokjpnok) are combined because of their 
similar semantics. So, it is recommended first to ask about the value of s2 and 
afterwards about the value of sS. Conversation with a debugger might look like: 
Debugger: Is the outcome of statement line 5 ok? [yes/no] 

User: yes ..after recomputing the diagnosis and optimal measurements.. 
Debugger: Is the outcome of statement line 6 ok? [yes/no] 

User: yes ..again after computing the new diagnoses.. 

Debugger: Bug in assignment [line 8] . Have a look at the expression. 

This small example shows that it is not necessary to go through the whole eval- 
uation trace. It is sufficient to have a look at statements may influencing wrong 
values. This process can be guided by the debugger in an optimal fashion using 
fault probabilities. It should be noted that the dependency-based representation 
allows diagnosis times in the seconds range even for very large programs [4]. 

4 Conclusion 

This paper has described a dependency-based method for diagnosing Java pro- 
grams. Once the behavior of the individual components has been defined, the 
model-based approach enables an automatic conversion of arbitrary application 
problems into a logical representation suitable for diagnosis. The approach is 
powerful enough to represent iterative code including recursive functions, ab- 
stract enough to allow diagnosis runs for large programs, and generic enough to 
be applicable to different imperative languages although individual features may 
differ from what we have presented here for Java. This work builds on our pre- 
vious experience with building model-based debugging systems for VHDL and 
functional languages. The main limitation of the representation shown here is 
that discrimination is limited, but this can be avoided as shown in our example 
by using debugging results to guide the developer through the evaluation trace 
in a manner comparable to conventional debuggers, but with semantic guidance. 
It is also possible to incorporate more detailed models in a comparable manner 



Locating Bugs in Java Programs 183 



to our VHDL-oriented diagnosis systems, an approach we have not presented 
here for space reasons. Overall, the model-based approach provides an effective 
framework for modeling, a flexible, non-brittle approach to diagnosis that can ac- 
commodate multiple faults and provide help in measurement selection based on 
the structure of each individual application program. It also offers relatively high 
independence of actual diagnosis algorithms, and the ability to combine multiple 
models for more detailed reasoning about the sources of erroneous behavior in 
programs. 

References 

1. Gregory W. Bond. Logic Programs for Consistency -Based Diagnosis. PhD thesis, 

Carleton University, Faculty of Engineering, Ottawa, Canada, 1994. 175 

2. Luca Console, Gerhard Friedrich, and Daniele Theseider Dupre. Model-based di- 
agnosis meets error diagnosis in logic programs. In Proc. IJCAf pages 1494-1499, 
Chambery, August 1993. Morgan Kaufmann. 175 

3. Johan de Kleer and Brian C. Williams. Diagnosing multiple faults. Artificial 
Intelligence, 32(1):97-130, 1987. 175, 180, 182 

4. Gerhard Friedrich, Markus Stumptner, and Franz Wotawa. Model-based diagnosis 
of hardware designs. Artificial Intelligence, lll(2):3-39, July 1999. 175, 176, 178, 
182 

5. Peter Fritzson and Henrik Nilsson. Algorithmic debugging for lazy functional 
languages. Journal of Functional Programming, 4(3), 1994. 174 

6. Daniel Jackson. Aspect: Detecting Bugs with Abstract Dependences. ACM Trans- 
actions on Software Engineering and Methodology, 4(2): 109-145, April 1995. 174 

7. Bogdan Korel. PELAS-Program Error-Locating Assistant System. IEEE Trans- 
actions on Software Engineering, 14(9): 1253-1260, 1988. 174 

8. Ron I. Kuper. Dependency-directed localization of software bugs. Technical Report 
AI-TR 1053, MIT AI Lab, May 1989. 174 

9. Cristinel Mateis, Markus Stumptner, and Franz Wotawa. Debugging of Java Pro- 

grams using a Model-Based Approach. In Proceedings of the Tenth International 
Workshop on Principles of Diagnosis, Loch Awe, Scotland, 1999. 178 

10. William R. Murray. Automatic Program Debugging for Intelligent Tutoring Sys- 
tems. Pitman Publishing, 1988. 174 

11. Henrik Nilsson. Declarative Debugging for Lazy Eunctional Languages. PhD thesis, 

Linkoping University, April 1998. 174 

12. Raymond Reiter. A theory of diagnosis from first principles. Artificial Intelligence, 
32(l):57-95, 1987. 175 

13. Ehud Shapiro. Algorithmic Program Debugging. MIT Press, Cambridge, Mas- 
sachusetts, 1983. 174, 175 

14. Markus Stumptner and Franz Wotawa. Debugging Functional Programs. In Proc. 

IJCAI, Stockholm, Sweden, August 1999. 175, 176 

15. Markus Stumptner and Franz Wotawa. Detecting and locating faults in hardware 
designs. In AAAI 99 Workshop on Intelligent Software Engineering, Orlando, 
Florida, 1999. 176 

16. Mark Weiser. Program slicing. IEEE Transactions on Software Engineering, 
10(4):352-357, July 1984. 174, 182 

17. Franz Wotawa. New Directions in Debugging Hardware Designs. In Proceedings 
lEA/AIE, 1999. 175, 176 



Application of a Real-Time Expert System for Fault 

Diagnosis 



Chr. Angeli 

Technological Education Institute of Piraeus 
Konstantinoupoleos 38, 171 21 Athens, Greece 
angel i@compul ink . gr 



Abstract. Fault diagnosis is an increasing important research topic and a lot of 
approaches have been investigating for the intelligent on-line diagnostic task. 
This work demonstrates the implementation of real-time expert systems 
technology for diagnostic purposes in hydraulic systems. A knowledge-based 
system is developed that in combination with sensor measurements is able to 
diagnose real-time faults. The system was developed in collaboration with a 
company which specialises in hydraulic systems and is used for industrial 
automated processes. 



1 Introduction 

The trend of developing automatic procedures towards more complexity, higher 
performance and more rigorous security requirements of industrial processes has 
invoked an ever increasing demand for real-time diagnostic knowledge-based systems. 

In these systems the knowledge engineering task comprises different knowledge 
sources and structures. Several paradigms have been proposed for the investigation 
and automation of these activities in various domains [1,2,3,4,5,6,7,8,9,10] 
Knowledge-based fault detection systems for hydraulic systems have been reported 
by [11] and [12]. 

Problems and difficulties in the development of real-line knowledge-based systems 
arise from the facts that theoretical modelling techniques incorporated in systems are 
not suitable for on-line performance of a system and traditional techniques in 
knowledge base development are not useful for on-line systems. New methodologies 
and new tools are needed to be developed for the efficient interaction of numeric and 
symbolic computing. 

This works presents the on-line implementation of a diagnostic system that operates 
in parallel to production machines, diagnoses real-time faults and proposes an 
effective interactive environment by combining different tools for the handling of 
numerical and symbolic data. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 184-191, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Application of a Real-Time Expert System for Fault Diagnosis 



185 



2 The Design of the Diagnostic System 

The diagnostic system is build using an actual electro-hydraulic system. A data 
acquisition, monitoring and control module is responsible for generation and 
interpretation of signals coming from the actual hydraulic system into an accepted 
format by the computer as well as the analysis and presentation of the signal 
information. Measurable quantities of the variables corresponding to the pressure at 
critical points of the hydraulic system and the angular velocity of the hydraulic motor 
as well as digital input signals referring to the functional condition of the system are 
transferred to the expert system for the decision making process. 

The inputs to the expert system are the information produced by the analogue and 
digital signals of the sensors built in the actual system through the data acquisition 
process as well as the results of the system simulation. Discrepancy between the 
behaviour of the model and the behaviour of the real system assumes that the system 
has undergone an a fealure. The expert system compares the measured quantities with 
the corresponding simulation results and detects an equipment fault and diagnoses the 
source of the fault. 



3 Implementation 



3.1 The Electrical Control System 

The main electric devices of the hydraulic system are the electric motor, the electric 
pilot of the main relief valve, the proportional pressure valve for the control of the 
main pressure, the proportional 4-way valve for the control of the motor speed, 
pressure transducers for the measurement of the pressure and the logic levels for the 
pressure switch PSl, the pressure switch for the indication of the fluid contamination 
FLT and the oil level switch OLS. Electrical input signals are applied to the electric 
motor, to the electric pilot of the main relief valve, to the proportional pressure valve 
and to the proportional 4-way valve. Electrical output signals are produced by the 
pressure transducers, the pressure switch PSl the filter indicator FLT and the oil level 
indicator OLS. The basic system includes the electronic components and the hydraulic 
system. To add a fault diagnosis possibility to the experimental studies an I/O card 
and the computer are added. 

The layout of Fig. 1. represents the interaction between the modules of the system. 

The measurement of pressure is performed by three pressure transducers and the 
measurement of the angular velocity is performed by an incremental rotary encoder. 




186 Chr. Angeli 



1 





I/O 




ELECTRONIC 


j 

i 


HYDRAULIC 


CPT 


BOARD 


◄ ► 


INTERFACE 


◄-H ► 

j 

i 


SYSTEM 



Fig. 1. The experimental eontrol system 

The I/O card used is a 12 bit Multi-I/0 PC-card with 16 digital inputs, 16 digital 
outputs, 1 6 analogue inputs and 4 analogue outputs and is controlled by the D AS YLab 
software. The maximum sampling rate is 320 kHz that is divided to the activated 
analogue channels. This card can be programmed in C or with the compatible chosen 
Windows data acquisition software. 



3.2 The Electronic Interface 

For the connection of the electrical devices of the hydraulic system and the 
measurement instruments with the built-in I/O board an electronic interface was 
developed and constructed. This interface is a combination of an electronic system 
and a terminal box for the connection of the electro-hydraulic devices with the 
computer. 

The electronic interface consists mainly of a multiple power supply module, a 
module with the relays for the input and output signals and the amplifier cards for the 
control of the proportional valves. 

The electronic interface is constructed using a 19” EURO-magazin. This interface 
can be extended for use for other proportional hydraulic systems. 

The front panel of the electronic interface is illustrated in Fig. 2. The layout of the 
rear panel is shown in Fig. 3. 











MULTI 




DIGITA 


DIGITAL 








POWER 


L 




AMPLIFIER 


AMPLIFIER 


POWER 






OUTPUT 








SUPPLY 


INPUT 




VT5005 


VT2013 


SUPPLY 






RELAYS 








AC 220 V 


RELAYS 








UNIT (DC) 





□ □ 



Fig. 2. Front panel of the eleetronie interfaee 









Application of a Real-Time Expert System for Fault Diagnosis 1 87 









0 


0 


0 


0 














EMOT 


FLT 




WRA 














0 


PSl 


HDA 


WRB 


I/O 


AC 220V 


PILOT 


Qls 


Qidb 


^WRF 


Q 


BE 



ED Eir 



Fig. 3 . Rear panel of the eleetronie interfaee 

This panel includes the connections to the electric motor (EMOT), the electric pilot 
of the main relief valve (PILOT), the filter (FLT), the pressure switch (PSl), the oil 
level switch (OLS), the pressure transducers (HDO, HDA, HDB), the solenoids and 
the feedback of the proportional 4-way valve (WRA,WRB, WRF), the solenoid of the 
proportional pressure valve (DBE) and the I/O card. 



3.3 The Data Acquisition System 

On the normal operation the hydraulic motor takes approximately 0,4 s to change the 
speed from a low value corresponding to U 2 =l V to a high value corresponding to 
U 2=6 V. Any fault that occurs in the system can affect both the dynamic and the 
steady state of the system. Thus if data are taken over a period 0 to 0,4 s after having 
applied the change of voltage U 2 both the dynamic and the steady condition should be 
determinable. In this work the positive response of the curve is used for the fault 
detection (Fig. 4). 




0,0 0,4 0,8 15,0 15,4 15,8 

!◄ — H Time period for acquiring data 



Fig. 4 . Time period for aequiring data 





188 Chr. Angeli 



The data are acquired by a data acquisition system that is realised with the use of the 
DASYLab software. The DASYLab is a graphical programming software that takes 
advantage of the features and the graphical interface provided by Microsoft Windows. 
This software provides an “intuitive” operating environment, which offers data 
analysis functions, a high signal processing speed, an effective graphical display of 
results and data presentation possibilities. A measuring task can be set up directly on 
the screen by selecting and connecting modular elements which can then be freely 
arranged. 

This data acquisition system runs in parallel to the actual system and acquires data 
of the angular velocity and the pressures accurately for the time of 0,4 sec while the 
speed is changing. 

The simulation program calculates the values of the angular velocity and the 
pressures for the same period of time 0,4 sec. 

The comparison of measured the calculated values is performed through the expert 
system. 

Since the acquired data refer to the dynamically changing state and the actual 
system is operating with a periodical voltage U 2 that is applied every 15 sec there is 
time enough for the running of the modules of the system. The data acquisition system 
runs for 3 sec the Simulation for 0,4 sec, the expert system for 4 for sec. 



3.4 The Diagnostic Task by the Expert System 

The diagnostic conclusions are performed by the expert system. Integration of 
scientific and experiential knowledge in a diagnostic system leads to the construction 
of a more accurate model of expertise for the problem solving process. This is because 
the knowledge is not available to a decision maker at a sufficient depth and deep 
knowledge is needed to fill gaps left by the compiled knowledge of a problem solver. 
So by combining the two sources of knowledge additional depth in knowledge is 
available. 

In this expert system the experiential knowledge is complementary used to the 
scientific knowledge of the mathematical model in order to model more precisely the 
expert’s reasoning activity, to gain the efficiency of heuristics and the advantages of a 
real world application. The knowledge acquisition phase resulted in some knowledge 
diagrams that led to a decision tree in the knowledge representation phase. This 
technique was used to define the various logical paths that knowledge base must 
follow to reach conclusions. The experiential knowledge and the scientific knowledge 
solve different parts of the overall problem domain co-operatively. Empirical 
knowledge is particularly useful in the diagnostic phase since the target is to find the 
specific faulty element and not only to declare a faulty behaviour of the system and to 
propose specific actions. Scientific knowledge is used for representing the dynamic 
behaviour of the hydraulic system as well as for predicting faults and detecting faults, 
while the empirical knowledge is used for isolating and diagnosing faults. The 
interaction between the two types of knowledge is driven by the current problem 
solving circumstances. 




Application of a Real-Time Expert System for Fault Diagnosis 1 89 



The representation and the on-line interaction of all these types of knowledge 
requires a suitable environment. The combination of the capabilities of two different 
environments offer a suitable tool. The KPWin development environment is used for 
the symbolic representation of the empirical knowledge and the part of scientific 
knowledge embedded on the circuit diagrams. The developed with the DASYLab 
software first part of the expert system is used for the numerical calculations, the 
representation of scientific knowledge coming on-line from the sensors, the display 
results of the simulation and the comparison of the results. 

The interaction of the various sources of knowledge was realised by a knowledge 
representation scheme available by the high level object oriented language 
KnowledgePro supported by the KPWin++ development environment, the “topic” that 
offers the capabilities of a traditional object while also serves as a set of expert 
systems rules. 

In this expert system rules are embedded in topics so that the structure of the final 
application is a collection of topics. Rules that refer to general assumptions and are 
represented to specific branches of the decision tree are grouped and embedded in a 
specific topic. In the structure of a “topic” can interact stored knowledge in rules and 
external information from files coming directly from the data acquisition system or 
pre-processed from the first part of the expert system. 

States that can be measured relatively easily are handled by digital input signal 
information. The digital input signal information that is transformed in qualitative text 
files interacts directly with the knowledge base of the expert system. 

In the case that multiple faults occur in the system, topics related to other elements 
that are possibly involved in this fault are called and checked before the final 
diagnosis is declared. For this task the text file information that comes on-line from 
the digital input signals of the system are particularly useful. These files are normally 
checked first to eliminate the possibility of multiple faults, but their topics can be 
called at any time. 

Deviation between the behaviour of the model and the behaviour of the real system 
assumes that the system has undergone an anomaly. The difference between the values 
of the pressure at specific points of the actual system and the angular velocity and 
their corresponding values from the simulation of the fault free system are considered 
as fault criterion. These values are compared to predifmed thresholds. These 
thresholds represent an acceptable deviation from the ideal system and are determined 
from experiential knowledge according to the needs of the application. 

Deep and shallow reasoning [13] are used co-operatively for the final decision. 
Deep reasoning determines potential causes while shallow reasoning aims directly 
and fast at the specific cause of the anomaly. The knowledge base is organised in 
rules. Domain specific knowledge is represented in a suitable formation to perform 
diagnosis of faults and to explain the reasoning procedure of the system. 




190 



Chr. Angeli 



4 A Diagnostic Example 

In the following, a diagnostic example is presented that indicates the function of the 
expert system as well as the procedure of realising, detecting and diagnosing a fault. In 
this example the symptom is a reduced speed of the hydraulic motor. This fault is 
simulated by producing an artificial leakage through a flow regulator. The output 
signal of the rotary encoder is used to calculate the angular velocity and to write the 
values to a file. These values and the corresponding values from the simulation are 
compared in the expert system. The results are written to files that interacts with the 
knowledge base of the expert system that detects an “extended motor leakage” fault. 
Fig. 5 displays the diagnosis screen in the case of an "extended motor leakage" fault. 



I FA UL TDiAGNOSiS / 






Extended meter teakaget 



Explain | 


Close 1 




Continue 


Help 1 





1 

01 


Ml 1 1 ] 1 Mr, Jhi, 

0 


(ii 

0 




Fig. 5. Diagnosis screen in the ease of an extended motor leakage fault 



5 Conclusion 

In this paper, a the real-time diagnostic expert system is presented that has been 
developed by combining on line sensor information with knowledge-based techniques. 
The system enables quick and reliable detection of equipment faults, trigger diagnostic 
activities and diagnoses the source of faults. The diagnostic results are accurate and 
the method is applicable to real world situations. The benefits gained for hydraulic 
systems by detecting real-time faults are mainly financial and offer also a higher 
quality for the maintenance process. 



Application of a Real-Time Expert System for Fault Diagnosis 191 



References 

1. Frank, P. : Fault Diagnosis in Dynamic Systems Using Analytical and 
Knowledgebased Redundancy - A Survey and Some New Results. 
Automatica Vol. 26, (1990), 459-474 

2. Basseville, M., Nikiforov, L: Detection of Abrupt Changes: Theory and 
Application, Prentice Hall (1993^ 

3. Patton, R., Frank, P., Clark, R.: Fault diagnosis in dynamic systems. Theory and 
application, Prentice Hall (1989) 

4. Surgenor, B., Jofriet P.: Thermal fault analysis and the diagnostic model 
processor. In IF AC symposium. On-line fault detection and supervision in the 
chemical process industries , Eds. Dhurjati P. and G. Stephanopoulos. No.l, 
Pergamon Press (1993) 

5. Rengaswamy, R., Venkatasubramanian, V.: An integrated framework for process 
monitoring, diagnosis, and control using knowledge-based systems and neural 
networks. In IF AC symposium. On-line fault detection and supervision in the 
chemical process industries, Eds. Dhurjati P. and G. Stephanopoulos. No.l, 
Pergamon Press (1993) 

6. Kordon, A., and Dhurjati, P.: An expert system for crude unit process 
supervision. IF AC International federation of automatic control. On-line fault 
detection and supervision in the chemical process industries. Eds. Morris A. and 
E. Martin, Pergamon (1996) 

7. Harris T., Seppala C., Jofriet P., Surgenor, B.: Plant-wide feedback control 
performance assessment using an expert system framework. On-line fault 
detection and supervision in the chemical process industries. Eds. Morris A. and 
E. Martin, Pergamon (1996) 

8. Heiming, B., and Lunze, J.: Parallel Knowledge-Based Process Diagnosis 
Applied to a Local Power Station Plant, In IF AC Symposium, Fault Detection, 
Supervision and Safety for Technical Processes, Kingston Upon Hull, UK, (1998) 
1113-1118 

9. Norvilas, A., Negiz, A., DeCicco, J., Cinar, A. Intelligent Process Monitoring by 
Interfacing Knowledge-Based Systems and Multivariate SPC Tools, In IF AC 
Symposium, Fault Detection, Supervision and Safety for Technical Processes, 
Kingston Upon Hull, UK, (1998) 43-48 

10. Li, I.S., Kwok, K., Zurcher, J.: Prediction and Prevention of Sheetbreak Using 
PLS and an Expert System, In IF AC Symposium, Fault Detection, Supervision 
and Safety for Technical Processes, Kingston Upon Hull, UK, (1998) 1159-1164 

11. Weule, H., Noe, T.: Expertensysteme: Stand der Technik und 

Einsatzmoeglichkeiten in der Hydraulik o+p "Oelhydraulik und Pneumatik" 33 
Nr. 6. (1989) 

12. Angeli, C., and Chatzinikolaou, A.: An expert system approach to hydraulic 
systems. Expert Systems, Vol. 12. (1995) 

13. Tzafestas, S.: System Fault Diagnosis Using the Knowledge-Based Methodology, 
Eds. Patton R., Frank, P. and Clark, R. Fault diagnosis in dynamic systems. 
Theory and application, Prentice Hall. (1989) 




Operative Diagnosis Algorithms for Single-Fault 
in Graph-Based Systems 



Mourad Elhadef^, Bechir El Ayeb^, and Nageswara S.V. Rao^ 



^ Department of mathematics and Computer Science 
University of Sherbrooke, Quebec, Canada 
^ Computer Science and Mathematics Division 
Oak Ridge National Laboratory, Oak Ridge, TN 37831, USA 



Abstract. A number of real-life systems can be modeled, to a certain 
level of abstraction, as directed graphs for the purpose of fault diagnosis. 
In these graphs, system components are represented as nodes and the 
fault propagation between any two nodes is represented by an edge be- 
tween them. Alarms are attached to some components to detect faults. 
The fault diagnosis problem deals with computing the set of all poten- 
tial faulty components, Ps, corresponding to a set of ringing alarms Ar. 
Exactly one component can become a fault source at any time, and the 
fault can subsequently propagate to others. In this paper, we present two 
algorithms for the single fault diagnosis problem which perform better 
than existing algorithms under different conditions. 



1 Introduction 

Fault diagnosis methods have been extensively used in a wide variety of systems 
such as process plants [11], chemical industries [5], aircraft systems [1]. The basic 
goal of fault diagnosis is to identify faulty components in a system based on the 
sensory information. Clearly, it is not always possible to meet this goal. For 
example, when all sensors are faulty, no information is available to perform the 
diagnosis. Thus, some restrictions on the number of faulty units in a system 
has to be assumed. Here, we consider that only one component can be the fault 
source, although the fault can propagate to other components. Also, all sensors 
are assumed to be fault free. 

There is a myriad of methods employed to solve the diagnosis problems, such 
as rule-based systems [5], diagnosis reasoning based on structure and behavior, 
graph theoretic methods [7], diagnosis from first principles [8], estimation meth- 
ods, and fault-trees (to name a few). One of the most extensively studied is the 
fault propagation graph method which has been shown to be useful in a number 
of practical applications. In the fault propagation graph, the system components 
are represented by nodes, and the fault propagation between two components 
is represented by an edge. Some of the components are equipped with alarms 
that become active in response to faulty conditions at the component. An active 
alarm represents a fault which either originated at the component or propagated 
from some other faulty node via the edges. The fault diagnosis problem deals with 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 192—197, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



Operative Diagnosis Algorithms for Single-Fault in Graph-Based Systems 193 



computing the set of all potential failure sources, P 5 , that correspond to a set 
of ringing alarms. In the single fault diagnosis problem^ only one component can 
be the fault source at any time, although the fault could have propagated to a 
number of other nodes, thereby activating several alarms. Variants of this model 
have been extensively utilized in practical applications [ 1 , 6 ]. 

One of the main advantages of the fault propagation graph method is the 
existence of very efficient computational diagnosis algorithms. There have been 
several efforts in estimating the time complexities of various diagnosis prob- 
lems [4]. One of the basic class are the zero-time systems for which that fault 
propagation times are zero for all edges. In these systems, a fault instantaneously 
propagates to all nodes of a cycle and hence the resolution of diagnosis is lim- 
ited to cycles. Thus, we assume that a graph is condensed such that each set 
of intersecting cycles is represented by a single node [7]. This condensation can 
be done in 0{n + e) time, where n and e are the number of nodes and edges 
of the graph respectively. Without loss of generality we assume that the given 
graph is acyclic, i.e., does not contain any directed cycles. For these systems, an 
algorithm for single fault diagnosis has been proposed in [7]. Furthermore, this 
algorithm has a worst-case time complexity of 0{k(n — A: + 1 ) + e), where k is the 
number of alarms. For this problem a lower bound of f2{k{n — k 1)) is shown 
in [7] under the decision tree computational model, when the reachability matrix 
of the graph is available. In this sense the algorithm of [7] has the optimal com- 
plexity, since 0 (n + e) time is required for reading the graph, and 0{k{n-k-\-l)) 
time is required for diagnosis. But, if the graph (or its reachability matrix) has 
already been read into the memory, this algorithm still requires 0 (e) time. 

In the present paper, we present two algorithms for the single fault diagnosis 
problem which perform better than existing algorithms under different condi- 
tions. 



2 Fault Propagation Model 

The fault propagation model of a system is given by the graph G = (I/,F), 
where V is the set of components and an edge {i,j) ^ F denotes the fact that 
a fault at i propagates to j. Let |V| = n and \E\ = e. Let A denote the set 
of nodes to which alarms are attached, and let Ar denote the set of all ringing 
alarms at a given point of time. Let \A\ = k and \Ar\ = kR. Let denote the 
set of zero-out degree nodes. 

For i G V, the set of nodes from which an edge to i exists is called the 
predecessor- set of i and is denoted by PS{i). Also, the set of nodes to which 
an edge from i exists is called the successor-set of i and is denoted by SS{i). 
The set of all nodes to which a path from i exists is called the from-set of i 
and is denoted by FS{i). To avoid the degenerate case of faults that cannot be 
detected at all, we assume that C A. The reachability matrix of G is given 

hy R = [rij], for i,j G V, where = 1 if a path from i to j exists, and = 0 

otherwise. 



194 



Mourad Elhadef et al. 



In the single-fault diagnosis problem we are given the set of ringing 
alarms, and are required to compute the set of potential fault sources Ps. 
Note that P 5 = 0 if no single fault source explains all ringing alarms of Ar. 
Typically, the fault propagation graph is pre- structured once at the beginning 
to extract the information that does not specifically depend on ringing alarms, 
e.g., reachability matrix storage and level-structuring. Then, Ps is computed for 
different situations that corresponds to possibly different Ar"^^. 

3 Diagnosis Algorithms 

3.1 Binary Penalty Algorithm 

Given the set of ringing alarms, Ar^ the penalty vector P = [p^], i G P, 
where = 1 if z G Ar^ pi = —00 if i G A — Ar^ and pi = Q otherwise. We 
define the vector State as the product of the reachability matrix R and the 

penalty vector P, i.e., for State = [s^], i G 1^, we have Si = ^ '^ijPj- Now, a 

jev 

potential failure source is characterized as follows: 

Theorem 1 A node i ^ V is a potential fault source if and only if Si = Jcr, 
where Pr denotes the number of ringing alarms. 

Proof. Proof of necessity follows trivially from definition of Si. 

{Sufficiency) We have Si = ^ ^ijPj = ^ ^ijPj + X] '^ijPj' Since 

jev jeFS{i) jev-FS{i) 

'^ijPj = O 5 it follows that Si = ^ '^ijPj = S Gift + 

jev-FS{i) jeFS{i) jeFS{i)nAR 

^ rijPj. Note that '^ijPj = 0. Since Si = Pr^ we have |PP(i)n 

jeFS(i)-AR jeFS(i)-AR 

Ar\ = pR. Thus, FS{i) HAr = Ar and then Ar C FS{i). Therefore, all ringing 
alarms are descendants of node i, implying that i is a potential fault source. 

The outline of the diagnosis algorithm, noted by BinaryPenalty, is based 
on two phases: i) compute penalty vector P, ii) compute the vector State = RP. 
The set of all potential fault sources is given hy Ps = {i G V\si = Pr}. The 
main steps in the implementation of BinaryPenalty are as follows: 

(a) Pre- structuring: We first read and store the reachability matrix of G in 0{n^) 
time [3] . For each i ^ V — A^ we delete pi from the penalty vector P (since 
their pfs are all zero), and thereby reduce the size of P to A: elements. Also, 
we delete from the reachability matrix R the columns corresponding to these 
nodes, hence the resultant reachability matrix is of size nP. Complexity of 
this step is O(n^). 

(b) Computation of Ps'. For any given Ar^ since i is a non-ringing alarm (i.e., Si = 
— 00 ) we do not need to calculate Si. Thus we delete all rows corresponding 
to these nodes from the reachability matrix P, thereby reducing the size of 
the matrix from n rows to n — P P Pr. Thus, the required matrix vector 
product of line 2 of BinaryPenalty can be computed in 0{{n — P P Pr)P) 
time. 



Operative Diagnosis Algorithms for Single-Fault in Graph-Based Systems 195 



Under the decision tree model (see [7] for details), the availability of R does 
not affect the complexity of computing P 5 , which has a lower bound of Q{k{n — 
A: + 1)). This lower bound is achieved by the algorithm of [7] and very closely 
matched by BinaryPenalty. But, the algorithm of [7] computes Ps with a 
complexity of 0{k{n — kpl)Pe) for any Ar. Thus, BinaryPenalty achieves a 
lower complexity for dense graphs at a preprocessing cost, and also is significantly 
simpler to implement. 



3.2 Prime Penalty Algorithm 

Consider the set of kR distinct prime numbers F = {/z}, i G Ar^ such that fi > 
1 . We redefine the penalty vector P = [pi]A ^ follows: pi = fi if i ^ 

^R^Pi = 0 if i G A — Ar^ and Pz = 1 otherwise. Also, we redefine a new vector 

State = [sz], for z G U, such that Si = pi with the condition that 

jess{i) 

n = 1, i.e.. Si = fi for z G H Ar. Note that Si is a multiple of pj if and 

0 

only if j G FS{i). Consequently, 5z = 0 if and only if there is a path from z to 
a non-ringing alarm j ^ A — Ar. A potential failure source is characterized by 
the following theorem. 



Theorem 2 A node i ^ V is a potential fault source if and only if Si mod 
n fj = 0 ; Si 0. 

jeAn 



Proof. The state Si can be decomposed as follows 

( ^ n. 



Si =Pi 



n 

\je(FS{i)-{i})nAR J 



'■jess(i) ^ 






= B^ n /Al) 



j^FS{i)nAR 



By definition, for z G U, we have Yl 

je{FSii)-{i})nAR 

can be expressed as a multiple of Yl 



Pj 



7^ 0. Since the term Yl 



jess{i) 

Pj^ it follows that Bi is an 



je{FS{i)-{i})nAR 



integer. 

Proof of necessity follows immediately from definition of Si. 

Sufficiency — Consider that the condition ( Si mod /j j = 0, and 5 ^ 7 ^ 0 is 

satisfied. Since Si 7 ^ 0 , no non-ringing alarm is reachable from z. Also, Si is a 
multiple of Yl fj - From Eq ( 1 ), we have Si to be multiple of Yl fj- Now, 

J^Ar j^FS{i)nAR 

since all /j’s are mutual primes, we have Yl fj — n fj. To see this, 

J^Ar j^FS{i)nAR 

consider there exists u G Ar and u ^ FS{i) such that Yl fj = fu n fh 

JcAr jeFS{i)nAR 

which means u is not reachable from z, and hence fu does not appear in sp, 
consequently, Si mod /j 7 ^ 0, which is a contradiction. Hence, we conclude 

jeAR 



196 



Mourad Elhadef et al. 



that Ar = FS{i) H Ar and then Ar C FS{i). It follows that i explains all 
ringing alarms j G Ar in addition to not being connected to any non-ringing 
alarm; thus, i is a potential fault source. 

Based on this theorem, the set of all potential failure sources Ps is given by 
Ps = {i€ V\si modHisAR fj = 0; ^ 0}- 

We define lev el- structuring of G, which partitions V into levels, such that 
nodes of level I are denoted by L{1). We start with L{1) = . Nodes of L(l + 1) 

consist of all nodes of 0 out-degree in the graph obtained by removing from G all 
edges from nodes of L(l), L(2), . . . , L{1). The partition T(/), I = 1, 2, ... can 
be computed in 0(n + e) time [7]. The outline of the diagnosis algorithm, noted 
by PrimePenalty, can be summarized in two steps: i) compute the penalty 
vector P = [pi] by utilizing kR prime numbers F, ii) compute si ^ pi 

jess(i) 

for i G T(Z), and then the set of all potential failure sources Ps is given by Ps = 
{i G V\si mod fj — ^5 according to Theorem 2. 

jeAn 

Consider the computational model in which mod and integer product op- 
erations take 0(1) time. For k < 751,322, we obtain k prime numbers from 
the available lists^ at http://www.utm.edu/ research/primes/lists. The compu- 
tational complexity of arithmetic operations of PrimePenalty consists of two 
components: (a) product operations in computing s^’s, and (b) mod operations 
in checking if the remainder is 0. The total number of product operations in 
computing all s^s is no more than e, and the total number of mod operations 
is no more than n — k -\- kR. Thus, the complexity of the algorithm is 0{n + e), 
which is lower than best possible 0{k{n — k F 1) F e) under the decision tree 
model. Note that when k = n/2^ the bound for the decision tree model is 
which could be much higher than 0{nFe) for sparse graphs. Thus, our algorithm 
represents a real reduction in the complexity. 



Effects of Integer Bounds 

In practical implementations there is a limit b on the largest integer represented 
on a computing system. The products 5^’s used in PrimePenalty can be quite 
large and cause overflow problems even for moderate values of n (even under 
the condition k < 751,322). Thus, the bound b may have to be specifically 
accounted for ii b < n^. In such case the complexity of multiplication and mod 
operations are no longer independent of n. Hence, their complexity must be 
explicitly accounted for in the complexity of PrimePenalty as shown below. 

Any integer x can be represented as base-5 number consisting of [log^x] 
integers each of which is smaller than b. For example, let ao, ai, . . ., apog^a;]-i 

be such representation of x such that x = ^ a^5b 

z=0 

^ For k > 751,322, one can generate k prime numbers once at the beginning in 
0{lP log k) time [9]. 



Operative Diagnosis Algorithms for Single-Fault in Graph-Based Systems 197 



Using the set F = {/z}, i = 0, — 1, of k prime numbers, each such 

prime number is represented as base-6 number of size [log^/c]. Then, each Si is 
represented as a base-6 number of size at most /c[log^ /c]. The complexity of (a) 
product operations in computing s^’s, and (b) mod operations in checking if the 
remainder is 0, both depend now on k. Using divide-and-conquer methods, each 
such operation on /3-bit numbers can be performed in 0{f3 log j3 log log fd) bit op- 
erations (a more direct method take operations) [2]. This method is easily 

extended to handle integers smaller than 6 instead of binary numbers in a direct 
manner with the same order of complexity in terms of integer operations. In each 
of these operations, we have j3 < k\log^k]. Thus, we have the total complexity 
of the arithmetic operations of Prime Penalty is O ((n + e)/31og/31oglog/3), 
where /3 = /c[log^ k']. 

4 Conclusion 

The single fault diagnosis problem deals with computing the set of all potential 
failure sources corresponding to a set of ringing alarms in a fault propagation 
graph. We presented two algorithms BinaryPenalty and PrimePenalty for 
this problem. These algorithms are better than existing ones under different 
conditions. This complexity also matches the universal lower bound of i?(n + e), 
for any graph algorithm for k < 751,322. Future investigations could consider 
multiple faults diagnosis and alarm placement. 



References 

1. K. H. Abbott. Robust operative diagnosis as problem solving in a hypothesis space. 
Proc. Seventh NaPl Conf. Artificial Intelligence, St. Paul, Minn., 1988. 192, 193 

2. A. V. Aho, J.E. Hopcroft, and J.D. Ullman. The Design and Analysis of Computer 
Algorithms. Addison- Wesley Pub., Reading, MA, 1974. 197 

3. T. H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. 
McGraw-Hill Book Go., New York, 1990. 194 

4. T. Bylander D. Allemang, M.G. Tanner and J. Josephson. Gomputational com- 
plexity of hypothesis assembly. Proc. Tenth Int’l Joint Conf. Artificial Intelligence, 
Milan, 1987. 193 

5. D. Lamb D. Ghester and P. Dhurjati. Rule-based computer alarm analysis in chem- 
ical process plants. Proc. Micon-delcn., 22, 1984. 192 

6. E. O’Shima J. Shiozaki, H. Matsuyama and M. Ira. An improved algorithm for 
diagnosis of system failures in the chemical process. Comp. Chem. Eng., 9(3), 
1985. 193 

7. N.S.V. Rao. Gomputational complexity issues in operative diagnosis of graph-based 
systems. IEEE Trans, on Computers, 42(4), 4 1993. 192, 193, 195, 196 

8. R. Reiter. A theory of diagnosis from hrst principles. Artif. IntelL, 32, 1987. 192 

9. P. Ribenboim. The Little Book of Big Primes. Springer- Verlag, 

10. 196 

11. G. Karsai S. Padalkar and N. Miyasaka. Real-time fault diagnosis. IEEE Expert, 
6 1991. 192 



On a Model-Based Diagnosis for Synchronous 
Boolean Network 



Satoshi Hiratsuka and Akira Fusaoka 

Department of Computer Science, Ritsumeikan University 
Nojihigashi, Kusatsu-city, 525-8577, Japan 



Abstract. In this paper, we propose a new diagnosis method for a syn- 
chronous boolean network (SEN) based of the concept of the model 
based diagnosis. We present an effective method to enumerate the set 
of all candidate of failure components incrementally, and also present a 
whole system for SBN diagnosis based on this algorithm. 



1 Introduction 

A synchronous boolean network (hereafter SBN) is a network of logical gates 
with delay elements which operates on a single-phase clock. The diagnostic task 
for SBN is very troublesome because it is necessary to repeat a diagnosis for 
the internal circuits for the given state transitions [4]. However, the diagnosis 
for SBN is practically significant because an usual LSI contains many flipflops 
which are unobserved from scanning or direct probing. 

In this paper, we propose a new diagnosis method for SBN based on the 
concept of the model-based diagnosis. The model-based diagnosis is a general 
method which allows determination of the set of fault components from the 
system description SD and the observation of the incorrect behavior OBS [3]. 
The diagnostic task is to detect the clause CF = {Ab{ci) V Ab{c 2 ) V ...Ab{ck)} 
such that SD U OBS D CF where the literal Ab{c) means that the component c 
is functioning abnormally if it is positive. The clause CF \s called a conflict set, 
which means that at least one of its element must be fault. Since we need only 
the minimal conflict set from this definition, it is a prime implicate of SD U 
OBS when they are formulated in the propositional logic [2]. The set of actual 
diagnosis can be formed by transforming the conjunction of all minimal conflict 
sets into the disjunctive normal form each of which product term constitutes a 
diagnosis. 

A diagnostic process of SBN is generally formulated as a problem to find out 
the conflict set such that {3z)[SD U OBS D CF] where z is a sequence of state 
variables. It contains two search problems; first we must detect the candidate 
of fault components, and secondly we must find out the value of internal states 
such that the incorrect behavior of components is actually consistent with OBS. 
Namely, we need not only to detect a conflict set but to examine its feasibility 
for the sequence of the observed data. Therefore, it is necessary to deal with the 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 198-204, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



On a Model-Based Diagnosis for Synchronous Boolean Network 199 



faulty component together with the information of the situation when it oper- 
ates incorrectly. In this paper, we introduce the abnormal-literals not only to 
indicate incorrect components but to represent its failure mode. We deal with 
only the stuck-at fault mode, that is, the output or input of faulty gate is fixed 
to be 1 or 0. This is the most natural assumption from the physical structure of 
the gate. However, this enlargement of the abnormal-literals leads to the tremen- 
dous increasing of complexity in the computation of the conflict sets if they are 
logically generated from SDUOBS via the theorem proving. And also the search 
space for the feasibility check becomes to be huge. In this paper. We propose 
an effective method for the SBN diagnosis, which is based on an incremental 
algorithm to compute the conflict set. 



2 Problem Description 

2.1 System Description 

SBN consists of interconnections of gates and D-flipflops. We denote by x^y^z 
the set of input, output and state variables, respectively. To represent the inter- 
connection of components, we also use the set of the intermediate variables u. 
We introduce abnormal literals for every component and its failure mode. The 
set of the abnormal literals is represented by AB. We assume that every fault is 
permanent, namely the literals in AB keep the same value during the process of 
observation. 

(1) System description The system description SD is a set of clauses which 
describes the normal behavior of components. We use ANDgates, ORgates and 
D-flipflops as a component. They are specified by the following formulas; 

ANDgate: Ah D {{xi V ;T2 V [xi V y), {x 2 V y}} 

ORgate: A^ {{xi V X2 V y}, {x 2 V y}, {xi V y}} 

D-flipflop: Ab D {{z V z'}, {z V z'}} 

We denote the clausal part of the system description of a component c by SD{c). 

(2) Fault description We assume that every fault is the stuck-at fault. There 
are six possibilities of failure for a 2 fan-in gate; x\ stuck-at 0, x\ stuck-at 1, 
X 2 stuck-at 0, X 2 stuck-at 1, y stuck-at 0 and y stuck-at 1. However, the stuck- 
at 0 failure of the input is equivalent to the y stuck-at 0 for ANDgates and also 
the stuck-at 1 failure of the input is equivalent to the y stuck-at 1 for ORgates 
so that we have four failure patterns for the gates. We assign the abnormal- 
literals LO, LI, RO, Rl, TO, T1 G AB for each failure as shown in the Table l.The 
predicate Ah in the system description is defined by 

A5 = LO V LI V RO V R1 V TO VYl 
By using these literals, the fault description FD is given by; 

FD for ANDgate 

Ll D {{y\/ X2},{y\/ X2}},RI D {{y V xi}, {y V xi}}, FO d {{y}},Yl D {{y}} 



200 



Satoshi Hiratsuka and Akira Fusaoka 



FD for ORgate 

LO D {{yy X 2 },{yy X 2 }},R^ D {{y V xi}, {y V xi}}, FO D {{y}},Yl D {{y}} 
FD for D-flipflop yO D {{y}},Yl D {{y}} 

We denote by FD(^) the clausal part of the above formula for the abnormal 
literal 



Table 1. Abnormal literals 



Ab literals 


operation 


Equ 

AND 


ivalent L< 
OR 


ogic 

D-FF 


- 


Normal 


XI A X2 


X1 V X2 


x' 


LO 


xi stuck at 0 


not used 


X2 


not used 


RO 


X 2 stuck at 0 


not used 


Xi 


not used 


LI 


xi stuck at 1 


X2 


not used 


not used 


R1 


X 2 stuck at 1 


Xi 


not used 


not used 


YO 


Y stuck at 0 


0 


0 


0 


Y1 


Y stuck at 1 


1 


1 


1 



(3) Observation The observation OBS consists of a sequence of the sets of unit 
clauses OBSiO = The OBSi is an observation of the system at each 

stage i of state transition. 



3 Diagnosis procedure for SBN 

3.1 Overall structure 

Let G{x^y^z) be a SBN with the inputs cc, the primary output y and the 
states z. The diagnosis D for the SBN G is defined by a set of components 
d = {c\ Ab{c)} such that Ai=i y\ z^) A d is consistent for the OBS = 

{{x\ y^)\i = 1, 2..., r}, where Ab{c) = LO^ V LY V RO^ V RY V YO^ V YY. The 
set D of all diagnosis d's can be built in the following two steps; 

The first step: We construct the set of the minimal conflict sets by the 
procedure described later and then generate a list of candidates for diagnosis 
from this set. 

The second step: In order to test the feasibility for each candidate D in 
the list, we make a reduced SBN G' by replacing the descriptions of the fault 
components in SD of G with the corresponding clauses of FD. We denote by 
SD' the system description of G' . Also we denote by A the parts of TD for the 
components of D. Namely, 

zi = {FD{0 \^Gd}, SD' = {SD - Ujgd SD{0) U Z\ 

We have three cases for SD' and A; 



On a Model-Based Diagnosis for Synchronous Boolean Network 201 



(1) AUOBS contains the contradiction. For example, the output of a gate is 
supposed to be stuck- at 0 in spite that it takes both values of 0 and 1 in OBS. 
In this case, D is not feasible so that it is eliminated from the list. 

(2) SD' is consistent with OBS. This means that D constitutes a diagnosis 
under the condition that all other components operates normally. If d is the 
minimal in the set of diagnosis D which is already obtained then it is added 
to D and the non-minimal elements in D are eliminated, if any. Else if D is not 
minimal, it is discarded. 

(3) SD' U OBS is inconsistent. In this case, D is possible to be a feasible 
but insufficient diagnosis. Therefore, we have another diagnosis problem for G' . 
The original procedure for diagnosis is called recursively but it halts eventually 
because G' is simpler than G. The resulted set of diagnosis for G' is joined to D. 

3.2 Conflict Set Generation 

(1) Rules for the conflict set formation Before we give an incremental 
algorithm to compute all conflict sets of a combinatorial circuit, we show how 
the conflict set is formed from those of the parts of the circuit. 

Let F{x^y) be a combinatorial circuit with the inputs x and the primary 
output y. For the given input values x, the value of the primary output y is 
called correct if it coincides with the value of y which is generated when F is a 
fault-free circuit. The minimal conflict set is empty if the output y is correct. On 
the other hand, we have always non-empty conflict set for the incorrect output. 

Assume that F consists of parts Fi,F 2 and a gate (Figure 1). We have four 
cases as shown in the Table 2 where A^B represent a conflict set of Fi,F 2 , 
respectively. For example, the easel in the Table 2 means both F\ and F2 are 
incorrect output 0 and the incorrect output of F itself is 0 if the gate is ANDgate. 
The minimal conflict set for F can be formed from those of Fi and F2 in the 
following way. 

The condition for F to generate y = 0 is ; for any a ^ /3 ^ B^ 

a A p AYTw a A (V^A LI) W P A (^A M) V TO 

= (a V V TO) A (yi V TO) A (of V FI V TO) A (/5 V LI V YO) 

From this formula, the minimal requirment for F to generate the incorect 
output is V /3 V TO so that this is the conflict set of F for the easel. 

ANDgate(the easel): UaeA V PV YO} 

Similarly, we can built the conflict set for the other cases. 

ANDgate(the case2): [Jp^^{P V Yl V Rl} 

ANDgate(the case3): V Tl V FI} 

ANDgate(the case4): 

V /5 V Tl} U {a V yi V LI} U{PVY1V FI} 

For the cases that the circuit F is constituted from the ORgate or D-flipflop, 
the resulted conflict sets are given by; 

ORgate(the easel): 

V /5 V TO} U {a V yO V LO} U V yO V FO} 

ORgate(the ease2): V yO V FO} 



202 



Satoshi Hiratsuka and Akira Fusaoka 



ORgate(the case3): \/ YO \/ RO} 

ORgate(the case4): 

D-flipflop(the easel): [JaeAi^ V FO} 

D-flipflop(the case2): V FI} 



(2) An algorithm for conflict set formation We can compute the all min- 
imal conflict sets by starting from the conflict set of the gates of input-side and 
by extending it stepwise to the higher level based on the rules above. 




Fig. 1. Circuit Description 



Table 2. Decomposition of Conflict Set 



case 


U V 

00 01 10 11 


Incorrect y 




AND 


OR 




u 




Incorrect 


1 


AB A(j) (jxj) 


0 


0 


case 


0 


1 


y 


2 


A(j) AB (pej) 0B 


1 


0 


1 


A 


0 


0 


3 


0B (fxp AB A(j) 


1 


0 


2 


0 


A 


1 


4 


(j)(j) 0B A(j) AB 


1 


1 





3.3 Feasibility Test 

The feasibility test includes the consistency check for {A U OBS} and {SD' U 
OBS} and the recursive call for the diagosis{SD' ^ FD — Z\, OBS^ D, A) algo- 
rithm with 5D', FD — A and OBS. 

The consistency check can be done by a direct application of the resolution 
principle. Especially, the consistency for {AUOBS} is easily checked because A 
does not contain so many clauses. The consistency check for {SD' U OBS} is a 
simulation which is effectively executed by repeating the check for {SD'UOBSi} 
and the determination of the value of the state variable at the next stage. 



On a Model-Based Diagnosis for Synchronous Boolean Network 203 



4 Example 

The system and fault description SD^ FD and the observation OBS for the 
circuit of Figure 1 are given by the following sets of clauses. 

SD = SD{1) U SD{2) U SD{3) U SD{4) where 
SD{1) = {{xi V ;T 2 V u}, {xi V u}, {x 2 V y}} 

SD{2) = {{xa V ;t 4 V v}, {xs V v}, {x 4 V v}} 

SD{3) = V V y}, {u V y}, {v V y}}, 

SD{4:) = {{v V TJ'}, {v V x'^)} 

FD = D {{u V X 2 }, {u V X 2 }}, Rl^ D {{u V xi}, {u V xi}}, 

TO^ D {{u}},YF D {{u}},LY D {{v V X 4 },{v \/ X 4 }}, RY D {{ 7 ; V X 3 }, {t; V 
Xs}},Y 0 ^ D {{v}},Yl‘^ D D {{y D 

D D ^ D {K}}} 

OBSi = {{xi}, {X 2 }, {xa}, {X 4 }, {?/}}, OBS 2 = {{xi}, {X 2 }, {xs}, {y}} 

We start with diagnosis{SD^ FD^ OBS^ 0, 0) ; 

CF formation: At the first stage, the primary output y has a correct output 0. 
The secondary output v has a CF if u = 1. At the second stage, X 4 

has a CF {YY,RY,Y 1 ^} if X 4 = 1 and also v has the same CF if u = 1. The 
primary y has incorrect output 1 so that it has a CF {Y l^,yi^,yr^,Tl^, RY}. 
Feasibility test: We select the candidates of diagnosis d = {YY)^d = (Tl^), 
d = (yi^),d = (yi^),d = {RY) in this order. The diagnosis d = (Tl^) is 
discarded because OBSUA is inconsistent due to Z\ = {{y}}. For d = (Tl^), we 
have a reduced SBNl with SD' = SD — SD{1) U A where A = However, 

SD' U OBS is inconsistent so that we call diagnosis{SD' ^ FD — Z\, OBS, (j), A); 
CF formation for SBNl: We have a CF {y0^,L0^}. 

Feasibility test for SBNl: d' = (TO^) is discarded because OBS U Z\ is 
inconsistent where A = {{y}, For d' = (TO^), we build the reduced SBN 2 

with SD" = SD' — SD{3)UA where A = {{v\/y}, {v\/y}, {'u}}. However, SD"U 
OBS is inconsistent so that we call again diagnosis{SD" , FD — A, OBS, (p. A); 
CF formation for SBN2: We have a CF {YY,RY,YY}. 

Feasibility test for SBN 2 : d" = (Tl^) is discarded because OBS U Zi is 
inconsistent where A = {{u}, {vW y}, {v\/ y}, {r^}}. Also d" = {RY) is discarded 
because A = {{uVxa}, {uVxa}, {uV^}, {v\f y}, {r^}} is inconsistent with OBS. 
For d" = (Tl^), we make the reduced SBN3 with SD'" = SD" — SD{Y) U A 
where A = {{x'^},{v V y},{v V y},{u}}. SD'" U OBS is consistent so that 
(yi^, LO^, yr^) constitutes a diagnosis. We return back to the selection of a 
new diagnosis with D = {(yi^, TO^, yi^)}. 

For the next candidate d = (YY), we make a new SBN with SD = SD — 
SD{4:) U A where A = {X 4 } after the second stage. This reduced SBN is con- 
sistent with OBS so that we have a diagnosis d = (YY). Note that this is 
simpler than (Tl^, LO^, TT^). Therfore, we replace the content of D. Namely, 

Since we can discard the candidates d = {YY),d = (RY) by the similar rea- 
soning, we have D = {yi^} finally. Therefore, we can conclude that the output 
of D-flipflop is stuck-at 1 failure. 



204 



Satoshi Hiratsuka and Akira Fusaoka 



5 Concluding Remarks 

In this paper, we propose a new method of model-based diagnosis for a syn- 
chronous boolean network. Especially, we describe the algorithm for the conflict 
set formation which aims at to keep the set of generated clauses as small as 
possible in the process of the diagnosis. The investigation of complexity for the 
actual benchmark circuits is left for the future work. 



References 

1. de Kleer, J. and Williams, B. C. 1989. Diagnosis with behavioral modes, Proc. 
of IJCAI-89, 1324-1330. 

2. Dressier, O., and Struss, P. 1996. The Consistency-based Approach to Auto- 
mated Diagnosis of Devices, in Principles of Knowledge Representation edited 
by G. Brewka, 2Q7-?>11^CSLI Publications. 198 

3. Reiter, R. 1987. A Theory of Diagnosis Prom First Principle, Artif. IntelL, 
32:57-95. 198 

4. Venkataraman, S., Haratanto, I. and Fuchs, W. K. 1996. Dynamic Diagno- 
sis of Sequencial Circuits Based on Stuck-at Faults, Proc. of the VLSI Test 
Symposium , 198-203. 198 



DermatExpert: Dermatological Diagnosis 
through the Internet 



Hans W. Guesgen and Jeong Seon Koo"*" 

Computer Science Department, University of Auckland 
Auckland, New Zealand 
{hans , jkoo002}@cs . auckland . ac . nz 



Abstract. This paper describes the internet-based expert system 
DermatExpert. DermatExpert allows users to self-diagnose the most 
common dermatological diseases at a sub-professional level, using various 
heuristics and diagnostic strategies. The main process in DermatExpert 
is an optimization process, which searches among candidate solutions 
that could explain the patient’s symptoms, trying to exclude hopeless 
candidates as early as possible. 



1 Introduction 

DermatExpert is an intelligent knowledge-based inference system for diagnos- 
ing dermatological diseases over the internet. Currently it contains 38 disease 
objects and 121 symptom objects in its knowledge base. The primary aim of 
DermatExpert is to perform the diagnosis of skin problems. DermatExpert com- 
municates with a user by asking questions and getting responses through a web 
interface. The system analyzes a user’s answers and generates the next question 
dynamically. 

The main strategy of diagnosis in DermatExpert is to find a disease which 
reasonably well explains the findings obtained through the analysis of user an- 
swers, i.e., it tries to find a good match between the conditions of a disease 
and the status of the patient. To implement this idea, DermatExpert applies a 
matching measurement which indicates how each disease object matches with 
the patient’s symptoms. A negative value of this measurement means that the 
conditions of the disease object are ill-matched; a positive one means that they 
are well-matched. Once the best case match degree of a disease object becomes 
negative, the system prunes the disease object, and reduces the searching space. 

There are various application scenarios for DermatExpert. For example, peo- 
ple can consult the system to find out more about their skin problems early in 
the development process so that they go to a doctor and get treatment in time. 
Or for a simple problem, which does not require a doctor’s consultation (which 
is often the case), DermatExpert can guide people to treat their problem by 
themselves, and save resources. Secondarily, DermatExpert can be used for an 

The authors are grateful to Stephen Wealthall for his comments on earlier versions 

of this paper. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 204-209, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 



DermatExpert: Dermatological Diagnosis through the Internet 205 




Fig. 1. The Process of Knowledge Acquisition for DermatExpert 



educational purpose. Medical students can use this system to train their diag- 
nosing skills, even though the diagnostic strategy of the system is not exactly in 
accord with that of doctors. 

In the following, we first describe the knowledge acquisition process of 
DermatExpert and then present the knowledge representation schemes and diag- 
nostic strategy of DermatExpert.^ Finally, we give a brief analysis of the system 
and sketch possible future work. 

2 Knowledge Acquisition 

Knowledge acquisition is the process of extracting and formalizing the knowledge 
of a domain [2,4,8]. Knowledge acquisition is one of the crucial stages in the 
development of an expert system, since the quality of the system relies on the 
accuracy and the structure of the knowledge base. There are many sources of 
domain knowledge, including human experts, textbooks, databases, and our own 
experience. Human experts provide flexible ways of knowledge acquisition. The 
typical methodology of acquisition from human resources is the interview. 

Knowledge acquisition in DermatExpert focused on different aspects during 
different stages of the acquisition process. Figure 1 illustrates the process of 
knowledge acquisition for DermatExpert. The conceptualization step involved 
getting the feel for the domain area and deciding the structure of the knowledge 
base. Primary concepts were obtained from textual sources. The next step was 

^ Due to space limitations, we cannot discuss here how the problem of maintaining 
system consistency and integrity is handled. 



206 Hans W. Guesgen and Jeong Seon Koo 



to write an initial knowledge base. At this stage, the focus was on each single 
disease to determine its unique characteristics. The following step was refining 
the knowledge base in collaboration with domain experts. 

Ideally, the cooperation with human experts is desirable from the beginning 
of the project, but this was unfortunately not the case for DermatExpert, as 
appropriate support was not available until the first version of the knowledge 
base was completed. As a result, the knowledge base of DermatExpert ended up 
to be disease-oriented, as such a knowledge base can be built by exploring each 
disease successively to some extent. 

The initial frame structure (slots) were designed after analyzing ten common 
skin diseases: Measles, German Measles, Scabies, Chicken-Pox, Impetigo, Acne, 
Lice, Insect-Bite, Eczema, and Psoriasis. Information about each of these dis- 
eases was extracted from five to six dermatology text books. Then each piece of 
information was converted into a key word and its relation with the disease. Af- 
ter that, the information was refined and professional terms were converted into 
easy English. For most diseases, there was information about onset duration, 
affected site, and spread phase. DermatExpert provides specified slots for these 
items in the Disease frame. For other items which depended on the diseases, 
slots were more fiexible. Clarifying the relation of each piece of information with 
the disease was the next step. Since textbooks usually describe findings of the 
disease, the information given in the textbooks mostly constitutes a necessary 
condition for the disease. This condition can be formulated in rules of the form 
If a patient has a disease D, then he or she would have symptoms SI, S2, S3 . . . 

The symptoms were classified into three groups: constraint, major symptom, 
and minor symptom. If any of these was a definite clue for the disease, then the 
symptom became pathognomonic^. 



3 The Knowledge Base 

Numerous knowledge representation schemes have been proposed and imple- 
mented, which may be classified into the following categories [6] : logical knowl- 
edge representation schemes, network knowledge representation schemes, proce- 
dural knowledge representation schemes, and structured knowledge representa- 
tion schemes. In addition to these four main methods, other approaches such 
as inclusion hierarchies, scripts, constraints, and relational databases have been 
developed [10]. Inclusion hierarchies handle a particular kind of knowledge very 
well: knowledge about objects that can be classified into groups such as some 
categories are subcategories of others. With the object-oriented approach, knowl- 
edge can be handled with a high level of abstraction. Scripts have been used in 
some experimental systems for natural language understanding to represent sce- 
narios with standard chronology. Scripts are like frames with additional support 
for describing chronology. A constraint is a relationship among one, two, or more 

^ Pathognomonic is a medical term, describing a symptom or sign that is characteristic 
of or unique to a particular disease. The presence of such a sign or symptom allows 
positive diagnosis of the disease. 



DermatExpert: Dermatological Diagnosis through the Internet 207 



objects; the constraint has to be satisfied by the system in finding a solution to 
a problem [3]. By emphasizing the use of constraints in representing a set of 
objects and their interrelations, a constraint-based approach to knowledge rep- 
resentation may be used. Finally, relational databases can be sometimes used 
as a method for knowledge representation; they are good at manipulating large 
amounts of well structured information in certain, largely preconceived ways. 

DermatExpert employs the concepts of frames, logical representation, and 
constraint theory, even though it uses different structures. The main objects in 
DermatExpert are Disease, Symptom (which includes signs and medical find- 
ings), and Patient. In dermatological diagnosis, there are some essential factors 
to be checked in most cases regardless of conditions [9]: present complaint, time of 
onset, developmental stage, personal history, and family history. DermatExpert 
selects and uses some of these factors in the prefiltering module to limit the 
search space. 

The knowledge base of DermatExpert comprises a list of disease objects and 
a list of symptom objects. The former is represented in a data structure called 
DiseaseVector, which is a vector of diseases containing information about each 
disease; the latter is represented in SymptomTable which is a table of symptoms 
containing information about each symptom. Since the system is designed to be 
used by non-professionals, it does not expect that users have the same insight 
into medical phenomena as doctors do. There are some symptoms which can 
be observed easily by doctors but not by non-professionals. If the user does not 
recognize the symptom, it is the same as absence of the symptom, at least to 
the user. The knowledge of the system is constructed from this point of view. 

4 Diagnostic Strategies 

When doctors diagnose, they use their experience and knowledge, and sometimes 
rely on the results of lab tests or medical instruments. One of the most crucial 
factors is their intuition, which may come from their experience, knowledge, or 
other sources. Human diagnosis, especially if it is based on intuition, is hard 
to systematize and implement in a computer program, and therefore a different 
approach is required for machine diagnosis. Unlike human diagnosis, the only 
source of diagnosis in DermatExpert is its knowledge base, which was extracted 
from textbooks and knowledge provided by human experts. 

In DermatExpert, a diagnostic problem is viewed as a search problem. A 
search procedure may follow one of two directions: one is forward, which starts 
from an initial state (symptom) and searches through a problem space to a goal 
state (disease), and the other is backward, which searches from a goal to an initial 
state. In most production rule systems, for example MYCIN, the main search 
direction is forward (except in the explanation step) [1]. This approach isn’t 
suitable here, as a single symptom does not tell much about a particular disease. 
Eor this reason, the diagnostic engine of DermatExpert employs a bidirectional 
search, which searches both forward from the start state and backward from 
the goal simultaneously until the two paths meet somewhere in between. The 



208 Hans W. Guesgen and Jeong Seon Koo 



interaction with the user triggers a forward search, based on symptoms and 
findings. When the status of the knowledge base is changed as a result of the 
forward search, a backward search is initiated. 

In general, medical information suffers from linguistic imprecision. For ex- 
ample, when a patient comes with scabies, he or she might complain of a slight, 
considerable, or severe itching. This difference might be due to the different 
physical condition or the person’s perception of the severity, and therefore it is 
almost pointless to try to extract an answer from the user about the degree of 
severity of the symptom. Instead, the system tries to ask a variety of questions 
from different angles, and then diagnoses by checking how well all the hndings 
from a patient match the conditions of each disease. To determine the degree of 
the match, DermatExpert uses a measurement. The result of the measurement 
can be a negative value to exclude certain diseases, which is often easier than 
to confirm them. For example, the fact that a patient is male is good evidence 
to exclude the possibility of a pregnancy, but the opposite (the patient being 
female) by itself is not strong evidence for a pregnancy. 

As with most expert systems that model real-world aspects, the information 
in the DermatExpert’s knowledge base may be erroneous or incomplete. For this 
reason, DermatExpert allows the representation of uncertain knowledge and the 
consideration of erroneous information. Also DermatExpert does not exclude to 
some extend the possibility of wrong answers given by the user. However, con- 
sideration of uncertainty sacrihces the accuracy of the system, which means that 
the more the system allows for uncertain information and inaccurate feedback 
from users, the less accurate the system becomes. 

An important factor influencing the efficiency of the system is the selection 
of the next symptom to be checked. DermatExpert starts with considering the 
symptom that is related to the largest set of diseases. Only when the system 
reaches the final stage, a depth-first search is initiated, which checks all symp- 
toms related to the candidate disease. This way, exhaustive search can be avoided 
in most cases. 

5 Conclusion 

Since the development of MYCIN, various medical diagnosis systems have been 
developed using AI techniques [5,7]. In most cases, however, the target user group 
of those systems was not the general public, but medical experts. As interest in 
health increases, and people require more knowledge about their own health 
status, medical consulting or status monitoring expert systems available to the 
public become more and more desirable. DermatExpert contributes to meeting 
these needs. DermatExpert is not meant to give professional medical advice. It 
consults users, and monitors the health status of users, so that they get a better 
understanding of their problems. 

Normally, when the performance of a system is discussed, time complexity 
and accuracy are considered. However, in an interactive system, especially when 
it is a network program, the time complexity of the algorithm is an insignificant 
factor, since user response time and network delay are usually greater than the 



DermatExpert: Dermatological Diagnosis through the Internet 209 



runtime itself. Nevertheless, it is important to keep the size of the search space 
under control. DermatExpert applies heuristics to reduce the search space. In 
addition, it applies the exclude-hopeless-elements-first strategy to cut down the 
size of the domain exponentially, assuming the candidates of the domain are 
uniformly distributed. 

Regarding the analysis of the accuracy of the system, there are several ways 
to estimate of the accuracy. One way is to examine the accuracy of the knowl- 
edge base and the inference process. Medical knowledge always changes and is 
constantly updated with new facts. Differences or inconsistencies were found 
between different dermatology textbooks fairly often. ^ And many of them were 
amended, based on the advice of human experts. 

Another way, and probably the most preferred one, is to measure the accuracy 
of the diagnosis. Although DermatExpert has not been tested at on a clinical 
scale, individual tests have indicated that the accuracy of the system is between 
85% and 95%. In the remaining cases, the system could not provide an answer 
and said so. In all cases where a condition was simulated that required urgent 
medical advice, the system so advised. The system therefore has the accuracy 
that is necessary for a self-diagnostic tool which allows an individual to gain 
more information, but does not replace a medical consultation. 

References 

1. B. G. Buchanan and E. H. Shortliffe. Rule-Based Expert Systems: The MYCIN Ex- 
periments of the Stanford Heuristic Programming Project. Addison- Wesley, Read- 
ings, Massachusetts, 1984. 207 

2. C. Dimitris. Knowledge Engineering: Knowledge Acquisition, Knowledge Repre- 
sentation, the Role of the Knowledge Engineer, and Domains Eertile to AI Imple- 
mentation. Van Nostrand Reinhold, New York, 1990. 205 

3. H. W. Guesgen. CONSAT: A System for Constraint Satisfaction. Research Notes 
in Artihcial Intelligence. Morgan Kaufmann, San Mateo, Galifornia, 1989. 207 

4. G. Guida and G. Tasso. Design and Development of Knowledge- Based Systems: 
Erom Life Cycle to Methodology. John Wiley V Sons, Ghichester, England, 1994. 
205 

5. E. Hayes- Roth, D. A. Waterman, and D. B. Lenat. Building Expert Systems. 
Addison- Wesley, Readings, Massachusetts, 1983. 208 

6. J. G. Hughes. Object-Oriented Databases. Prentice-Hall, Englewood Cliffs, New 
Jersey, 1991. 206 

7. D. McSherry. Hypothesis!: A development environment for intelligent diagnostic 
systems. In Proc. AIME-97, Grenoble, Erance, 1997. 208 

8. N. R. Shadbolt, K. O’Hara, and G. Schreiber, editors. Advances in Knowledge Ac- 
quisition: 9th European Knowledge Acquisition Workshop, EKAW ’96, Nottingham, 
United Kingdom, May Ij-ll, 1996: Proceedings, Berlin, Germany, 1996. Springer. 
205 

9. B. Solomons. Lecture Notes on Dermatology. Blackwell Scientihc, Oxford, England, 
1977. 207 

10. S. L. Tanimoto. The Elements of Artificial Intelligence Using Common Lisp. Com- 
puter Science Press, New York, 2nd edition, 1995. 206 



^ A discussion of these discrepancies is beyond the scope of this paper. 



Aerial Spray Deposition Management 
Using the Genetic Algorithm 



W.D. Potter*, W. Bi*, D. Twardus^ H. Thistle^ MJ. Twery^ 
J. Ghent^, and M. Teske^ 

^ Artificial Intelligence Center, University of Georgia, USA 
potter@cs . uga . edu 

^ United States Department of Agrieulture, Forest Serviee 
^ Continuum Dynamies, Prineeton, NJ 



Abstract. The AGDISP Aerial Spray Simulation Model is used to prediet the 
deposition of spray material released from an aireraft. The predietion is based 
on a well-defined set of input parameter values (e.g., release height, and droplet 
size) as well as eonstant data (e.g., aireraft and nozzle type). But, for a given 
deposition, what are the optimal parameter values? We use the popular Genetie 
Algorithm to heuristieally seareh for an optimal or near-optimal set of input 
parameters needed to aehieve a eertain aerial spray deposition. 



1 Introduction 

Determining the parameter value settings to use as input to the AGDISP Aerial Spray 
Simulation Model [2] in order to produce a desired spray material deposition is 
considered an instance of a parametric design problem [5]. Parametric design is a 
specialization or subtype of the more generic design problem. Typically, when 
working on a design problem, the solution representation is a set of instructions or 
components for achieving the design goals. This representation can also be called a 
configuration, especially if the elements comprising the configuration are predefined. 
For the parametric design problem we are dealing with, these elements correspond to 
the AGDISP simulation input parameters. Each parameter has its own domain and 
range of values. If we arrange the parameters in a one-dimensional array or vector, 
and select some value for each parameter from that parameter’s range then we would 
have an input parameter configuration. Using this configuration as input to the 
AGDISP simulation model would yield a prediction of the spray deposition. 

For this type of problem, the total number of possible configurations can be 
extremely large. Now, if we wanted to find the best configuration to achieve a desired 
spray deposition, then we could enumerate all the possible configurations and run the 
simulation on each one to see which configuration gave the best deposition. Clearly, 
this sort of computational task is outside the scope of current computing technology. 
The configuration problem suffers from what is called combinatorial explosion, that 
is, as the number of elements increases (e.g., add more parameters), the number of 
possible configurations also increases but at an exponential rate. See the discussion 
by Mittal and Frayman in [10] for more on generic configuration tasks and their 
complexity. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 210-219, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Aerial Spray Deposition Management Using the Genetie Algorithm 211 



One method we can use to reduce the computational burden of finding a 
particular configuration is a heuristic search technique. Heuristic search techniques 
have been shown to be effective techniques for finding acceptable solutions to 
problems with very large solution spaces. The major advantage of a heuristic 
approach is its speed. The major disadvantage is that there is no guarantee that the 
heuristic search will find the best solution or configuration. The heuristic search 
technique we use is the Genetic Algorithm. In the following sections we discuss in 
more detail the configuration problem, the genetic algorithm briefly, the aerial spray 
deposition problem, and our approach to the problem along with some recent results. 



2 Previous Work 

In the development of a good heuristic approach, two methods or knowledge-based 
system approaches are available to us. These are the rule-based (experiential) 
approach using typical if-then rules, and the functional (deep or associative) approach 
based on knowledge about the structure and behavior of a system and its components 
(see [3,4] for more on the two general approaches). Our Spray Advisor Genetic 
Algorithm (SAGA) approach, however, could be considered a combination of the 
rule-based and functional paradigms (although we do not have a typical collection of 
if-then rules, expert knowledge is incorporated into SAGA in the form of the 
sophisticated AGDISP simulation model). 

Probably the most famous expert system to be developed for design applications 
is R1 (XCON) which is used to configure computer systems from customer 
specifications [1,9]. An early example of an engineering design system for 
configuring networks using heuristics is DESIGNET, developed by Bolt, Beranek, 
and Newman in the early 1980's [8]. DESIGNET is a rule based design aid that 
focuses on an iterative user interface approach to configuration based on the process a 
decision maker goes through during the design process. 

Our own experience with configuration deals with designing battlefield 
communication networks to support specific missions. Our system, called IDA-NET, 
configures a “shopping lisf’ of communication equipment indicating type of 
equipment and number of components [13]. The “shopping lisf’ represents the 
required amount of equipment to support a particular mission. The goal is to 
minimize the number and types of components yet still satisfy a set of constraints 
associated with the mission, the equipment connectivity, and the available components 
in inventory. 



3 Aerial Spray Models 

For many years, computer simulation models for predicting what happens to spray 
material released from aircraft have been a major research interest of the USD A 
Forest Service [17]. The Forest Service Cramer-Barry-Grim (FSCBG) aerial spray 
model [14,15] and the Agricultural Dispersal (AGDISP) model [2,16] are examples of 
this research. AGDISP simulates the effects of aircraft movement and wake on 




212 W.D. Potter et al. 



material released from the aircraft. The model predicts the behavior of spray material 
droplet movement when sprayed from an airplane or helicopter. FSCBG predicts the 
dispersion of the spray material and the deposition of the material (that is, how much 
material settles on the ground and where). Both models analyze the movement of the 
spray material above the forest canopy, the movement among the trees, and the 
amount of material that actually reaches the ground. Getting the spray material to 
reach the proper location depends on many factors. These factors include: (1) the 
altitude of the aircraft when the material is released, (2) the speed of the aircraft, (3) 
whether the aircraft is an airplane or a helicopter, (4) the type of boom and nozzle 
system used to discharge the spray material, (5) the swath width of each pass of the 
aircraft, (6) the type and density of the forest, (7) wind speed and direction, (8) 
relative humidity, and (9) spray material characteristics. Determining the optimal set 
of factors in order to provide accurate (getting the spray material exactly where it 
should be), and inexpensive (using the exact amount of material; not too much and not 
too little) spraying is the goal of our research. We are currently investigating the use 
of a genetic algorithm to determine the parameters. 

The output of the various computer simulation models typically includes three 
important values: the deposition composed of volume median diameter (VMD) and 
drift fraction, and the coefficient of variance (COV). VMD is a measure of spray 
material droplet size. It is important to know the expected droplet size of the spray 
material as it leaves the aircraft nozzle, and also to know the droplet size that hits the 
ground. Variations in these two values are due to a number of factors including 
evaporation. Ideally, the spray material is evenly distributed over the entire spray 
block. The coefficient of variance gives an indication of the uniformity of the 
deposited spray material. The simulation models track the droplets leaving the aircraft 
and estimate the events encountered by the droplets as they make their way through 
the aircraft wake and descend onto the spray block (forest or crop area). Some of the 
spray material is likely to drift away from the target area onto adjacent lands. The 
amount of spray material deposited outside the spray block is identified via the drift 
fraction (smaller drift is better since that means the spray material stayed within the 
spray block or evaporated). 



4 The Genetic Algorithm 

Genetic Algorithms [5,6,7] are heuristic search routines that are guided by a model of 
Darwin's theory of natural selection or the survival of the fittest. Here the fittest 
means the most highly ranked solution in a large solution space. The basic idea 
behind the genetic search strategy is to generate solutions that converge on the global 
maximum (i.e., the best solution in the search space) regardless of the "terrain" of the 
search space. A typical terrain might resemble the Great Smoky Mountains with many 
peaks and valleys, an area that is relatively flat, and a highest peak (Clingman's 
Dome). One characteristic of genetic algorithms is that they are relatively unaffected 
by hill-climbing or being misled by some local maximum. The key to finding the 
global maximum lies in the ability to evaluate and compare possible optimal solutions. 




Aerial Spray Deposition Management Using the Genetie Algorithm 213 



The basic operations involved in a genetic algorithm (GA) are: 1) mate 
selection, 2) crossover, and 3) mutation. Typically, the major data structure is a 
binary string representing the possible solutions. In GA terms, a bit string 
corresponds to an individual, and a set of individuals is called a population. The 
fitness or strength of an individual is computed using some objective or fitness 
function, and is used to compare an individual with other individuals in the same 
population. During mate selection, parent strings are stochastically selected, 
according to their fitness, from the current population. Then, parent strings are 
"mated" via crossover to produce offspring for the next generation. Fitter parents 
contribute more offspring to the next generation than weaker parents because they 
have a higher probability of being selected for mating. This is the step that models the 
process of natural selection in nature. 



5 SAGA 

Our spray advisor GA sends a set of AGDISP parameters to the AGDISP simulation 
model. The AGDISP model calculates and sends back the deposition (drift fraction 
and deposit VMD) and variation for each parameter set. Based on the fitness function 
values (the values range from zero to 10,000) mapped from deposition and the 
coefficient of variation, the GA evolves an improved set of parameters and sends it to 
AGDISP. This process is repeated from generation to generation for each individual 
in the population until a satisfactory deposition is found. The corresponding 
parameter set is returned as the proposed set-up to achieve the desired deposition. 
Currently, we focus on eleven specific parameters. The eleven parameters used in this 
study are listed in Table 1. Other less important or more static parameters are kept 
constant during our experiments. However, they can become part of the variable 
parameter set (i.e., we can easily include additional parameters to the parameter set we 
are searching for) by specifying them at the beginning of each SAGA run. 



Table 1. SAGA Parameters and Their Ranges 



PARAMETER 


LOWER 


UPPER 


VMD Input (pm) 


100 


400 


Nonvolatile Fraction 


0.001 


1.0 


Wind Speed (m/s) 


0.23 


4.47 


Temperature (degC) 


1 


30 


Boom Height (m) 


3 


30 


Swath Width (fraction of wingspan) 


0.3 


3.0 


Humidity(%) 


0.0 


1.0 


Aircraft ID Number 


1 


124 


Boom Length (fraction of wingspan) 


0.3 


1.0 


Number of Nozzles 


1 


60 


Block Size (m) 


50 


1000 





214 W.D. Potter et al. 



We use a variant component of AGDISP DOS Version 7.0 for the AGDISP 
computation engine in SAGA. AGDISP DOS Version 7.0 has several advantages 
including reading its input parameters from ASCII data files, displaying output 
information to the screen as the run proceeds, and writing deposition output to a text 
file. The variant we use is a special dynamic link library (DLL) version that allows 
SAGA to interface with the DLL through a system of procedure calls and message 
passing. This gives us with the ability to make full use of the AGDISP simulation 
model without having to deal with the standard AGDISP user interface. The DLL 
version also has direct access to the aircraft characteristics database (a file containing 
specific physical and flight characteristics of 124 recognized aerial spray aircraft), and 
to the spray materials database (a file containing the specifications for a variety of 
aerial spray materials such as fire retardants, pesticides, and herbicides). The DLL 
was developed using the FORTRAN programming language while SAGA (both the 
genetic algorithm search engine and the user interface) was developed using Microsoft 
Visual Basic 5.0. 

The user interfaces with SAGA via a variety of input option windows. These 
interface windows are designed to provide users with flexibility and convenience to 
group user-defined GA parameters, preset necessary spray parameters, and 
dynamically view the output information. 

Depending on the user’s knowledge of genetic algorithms and the application 
purpose, the user can select either [Gypsy Moth GA Parameters] which is a parameter 
set especially for Gypsy Moth spray applications, [Cool GA Parameters] which is a set 
of recommended GA parameters for testing purposes, or the advanced [Customized 
Parameters]. If the user selects the [Customized Parameters], groups of GA 
parameters will be displayed and the user can modify the default settings according to 
their specific application needs. 

In practical spray applications, it is quite common that some spray parameters can 
and should be fixed based on the spray requirements. We thus provide the option to 
preset certain spray parameters by selecting [Preset Parameters]. A new interface 
window will appear with the eleven spray parameters listed. The user can select the 
ones to preset and fill in appropriate values. The rest of the parameters are left open 
to evolution by SAGA. 

The other portion of the main interface is designed to display intermediate results 
with two options provided. The first option allows the user to dynamically view the 
values of the eleven spray parameters and the three spray results. These values are 
associated with the best individual found so far as the program evolves from 
generation to generation. This option is set as the default output mode. The user can 
also click on the [View Chart] button to switch to a fitness growth chart with the 
maximum and average fitness values displayed dynamically. 

After the user finishes setting the GA and spray parameters, clicking on the [Run 
SAGA] button starts the run, or clicking [Reset Window] resets the parameters to their 
default values. Besides the spray parameters and results displayed dynamically in the 
main interface, the user can also click on [View Convergence Log] to look at a 
detailed report. 




Aerial Spray Deposition Management Using the Genetie Algorithm 215 



6 The Genetic Algorithm Used in SAGA 

The Genetic Algorithm driver in this study originated from the Simple Genetic 
Algorithm (SGA) described by Goldberg [6]. The GA initializes the first population 
with individuals generated at random. An individual corresponds to a set of AGDISP 
parameters. We use a real number representation for the individuals. 

We provide various GA options that users can select from in order to set GA 
parameters for SAGA. The user can enter population size, generations, crossover 
probability, and mutation probability into the text areas. Each of these parameters has 
a default value, e.g., 100 for Popsize, 80 for Generations, 0.65 for Crossover 
Probability, 0.007 for Mutation Probability. For the GA operators, we provide several 
options. For the selection scheme, users can choose among Naive Roulette Wheel 
selection. Tournament selection and Binary selection. For the crossover operation, 
users have the options of 1 -point, 2-point, uniform, and average crossover. Detailed 
discussions of the working principles of these selection and crossover schemes are 
available in [6]. We have Jump Mutation and Creep Mutation for mutation options. 
The former randomly selects a new value for a parameter within its valid range. The 
latter changes the old parameter by a small increment (error checking is added to 
make sure the new value is valid). Besides these basic GA parameters, we also add 
some other features such as Elitism, which will enable the GA to inherit the best 
individual from the previous generation when turned on. Another useful option is 
Fitness Scaling which is an advanced GA feature used to overcome "local maximum" 
convergence problems. With Elitism and Fitness Scaling turned on, SAGA normally 
converges in less than 30 generations. The GA population becomes basically 
homogenous after that and there is no necessity to run the program much longer. We 
thus provide a Stable Generations option so the user can specify how many stable 
generations (no changes in maximum fitness) are allowed before stopping SAGA. 
The current default value is ten. The user can also specify the tournament size used in 
the tournament selection scheme. The recommended value is two for selection in 
pairs. 



7 Results and Discussion 

For comparison and to test the feasibility of SAGA, we designed, with help from 
aerial spray experts, a pseudo exhaustive test. We fixed eight spray parameters 
(temperature, humidity, aircraft, boom width, nozzles, block width, and swath width) 
and used the exhaustive combinations of the other three parameters (non-volatile 
fraction, wind speed, and boom height). These eleven parameters were imported into 
AGDISP to produce batch results and we used the same fitness function in SAGA to 
obtain their fitness value. The experiment took several days. It should be noted that 
this pseudo exhaustive experiment is dependent on the increment step adopted. 

We then ran SAGA with the same eight spray parameters set to the same values, 
and let SAGA evolve Non-Volatile Fraction, Wind Speed, and Boom Height to obtain 
their best values as well best spray results. Various crossover and mutation 




216 W.D. Potter et al. 



probability combinations were used. Each experiment took about 1.5 hours to finish 
and the best result from SAGA was found among the top 0.1% of the exhaustive 
results. Table 2 shows a side-by-side comparison of best exhaustive with best SAGA 
results. These values are a good indication that SAGA is capable of finding near- 
optimal solutions for our spray application in a relatively short time. 



Table 2. Best Exhaustive and Best SAGA Comparison 





Exhaustive Test 


GA Test 


Maximum Fitness 


9428.176 


9427.255 


Non-Volatile Fraction 


0.78 


0.789324 


Wind Speed (m/s) 


0.28 


0.2823317 


Boom Height (m) 


6.100002 


5.776807 


Drift Fraction 


3.087256E-02 


2.968099E-02 


COV 


0.1648742 


0.1667711 


VMD (micron) 


101.625 


104.2233 



With no spray parameters fixed, SAGA is expected to generate better results 
compared to those with certain parameter restrictions and, in fact, this is the case. The 
best fitness and the corresponding parameters we obtained are listed in Table 3. 



Table 3. The Maximum Fitness from SAGA without Restrictions 
(GA crossover probability=0.65 and mutation probability=0.007) 





Best Results 


Maximum Fitness 


9924.08 


DSD-VMD (micron) 


100 


Non-Volatile Fraction 


0.788 


Wind Speed (m) 


0.264 


Temperature (degC) 


4.941 


Humidity (%) 


62.71488 


Aircraft 


110 


Boom Fength (fraction of wingspan) 


0.529 


Nozzles 


9 


Boom Height (m) 


7.086 


Block Size (m) 


964.9 


Swath Width (fraction of wingspan) 


0.543 


Drift Fraction 


0.00301 


COV 


0.0242 


VMD (micron) 


99.58 



The expected spray droplet size is 100 microns (Drop Size Distribution). Small 
droplet sizes and higher non-volatile fractions are desired to avoid unnecessary 
evaporation and attrition loss. The non-volatile fraction of 0.788 corresponds to a 
high resistance to evaporation of the spray material. The weather conditions during 
spraying should approximate those shown: wind speed at 0.264m/s, temperature 
of 4.941 degrees, and relative humidity of 62.71488%. These parameters contribute 






Aerial Spray Deposition Management Using the Genetie Algorithm 217 



to less spray evaporation and drift. The recommended aircraft number is 110, which 
refers to the Hiller FH 1100 airplane. The ratio of boom length to aircraft wingspan is 
recommended to be 0.529, and the ratio of swath width to wingspan should be 0.543 
(actually, not a very practical setting since narrower means more swaths). The aircraft 
boom should have nine nozzles to spray the material. The recommended boom height, 
which refers to the height of the aircraft above the forest canopy, should be 7.086m. 
The ideal spray area should be a square with 964.9m edges. 

Based on these spray parameters, the best spray result we obtain has a very low 
drift fraction 0.00301. An important goal is to minimize drift loss in order to reduce 
waste and achieve better spray coverage. COV is found to be 0.0242. The low value 
of COV is desired in order to achieve even distribution of the spray material. The 
simulated VMD is 99.58 micron, which is very close to the desired size. This 
indicates there is little evaporation or attrition loss of the spray droplets. Taken as a 
whole, these values indicate that the GA is capable of finding highly fit sets of 
parameters. However, in practice, there are typically several constraints to deal with 
in the spray scenario. In any case, the GA is highly robust and finds highly fit settings. 

We also ran two groups of experiments based on the practical spray parameter 
specification scenarios provided by USDA Forest Service experts. The maximum 
fitness obtained based on the first group of specifications was 9710.885. The spray 
parameters corresponding to this maximum fitness are: evolving - non-volatile 
fraction, wind speed, temperature, humidity, boom length, nozzles, and boom height; 
fixed - aircraft, block size, swath width, and droplet size. The notable evolved value 
was a boom height of 3.22 meters. This is an ideal height for crop application but 
much too close to the treetops for forest application. However, the closer the spray is 
released to the target the higher the deposition. 

The second group of experiments had a higher fitness of 9750.743. Boom height, 
aircraft, boom length and swath width were allowed to evolve in this case. Although 
more parameters were fixed, the experiment setup is a more realistic scenario that is 
likely to be encountered by aerial spray applicators (usually they fly in the early 
morning when the winds are calm, and the temperature and humidity are mild). 

These results were evaluated by forest experts and regarded as excellent 
predictions with high practical importance. More experiments are to be run to test 
other scenarios and the results are expected to assist practical spray applications, 
including selecting optimal spray conditions, estimating spray results, reducing spray 
cost, and minimizing spray drift. 

We ran numerous experiments to determine which GA parameters appeared to 
produce the best results. The selection of GA parameters such as population size, 
number of generations, crossover type and probability, and mutation probability is a 
key facet of the speed and success of the evolutionary process. These parameters are 
typically domain dependent. Our current GA parameter setup includes a population 
size of 100, between 50 and 100 generations, a crossover probability between 0.65 
and 0.85, and a mutation probability between 0.005 and 0.02. 

Another key issue in the development of SAGA is the mapping of the drift 
fraction, COV and VMD onto the fitness function. Our goal is to minimize the drift 
fraction, minimize the COV, and minimize the difference between the output VMD 
and the desired VMD. That is, get the exact amount of spray material evenly 




218 W.D. Potter et al. 



distributed over the spray block with the least loss due to evaporation and attrition. 
Our current fitness formulation is given below. Drift fraction has a higher weight 
because our main goal is to reduce drift and maximize deposition. 

We are currently working to incorporate AGDISP parameter dependencies and 
practical application considerations (spray knowledge) into a revised fitness measure. 
We are also working to improve the GA computing engine to produce better results 
faster. Our proposed new features include Engineered Conditioning [12] and other 
Fitness Scaling methods besides the linear scaling we are using now. Due to historical 
and technological reasons [11], COV is recommended to be close to 0.3. Once we 
have sufficient reliability with SAGA, we plan to investigate the appropriateness of 
this rule-of-thumb constant. Our progress so far is promising, we expect to make 
SAGA more comprehensive and reliable by combining feedback from forest managers 
and catering to the practical needs of aerial sprayer applicators. Our long-term goal is 
to make SAGA an instrumental tool for aerial spray applications. 



Fitness^lOOK 



|[50x(l.0-£)i^]+[25x(l-C(9J)]+ 





r 


{ VMD ] 


25xexj 


-8.0X 






V 


\VMDCtr ) J 



References 

1. Bachant, J., and J. McDermott (1984). “R1 Revisited: Four Years in the 
Trenches” in AI Magazine, Vol. 5, No. 3. 

2. Bilanin, A. J., M. E. Teske, J. W. Barry and R. B. Ekblad (1989). “AGDISP: 
The Aircraft Spray Dispersion Model, Code Development and Experimental 
Validation” in Transactions of the ASAE 32(l):327-334. 

3. Chandrasekaran, B., and S. Mittal (1983). “Deep versus Compiled Knowledge 
Approaches to Diagnostic Problem Solving,” in the International Journal of 
Man-Machine Studies, Vol. 19, No. 5, pp. 425-436, November. 

4. Chandrasekaran, B. (1991). “Models Versus Rules, Deep Versus Compiled, 
Content Versus Form” in IEEE Expert, Vol. 6, No. 5, April. 

5. Davis, L., (ed.) (1991). Handbook of Genetic Algorithms, Van Nostrand 
Reinhold, New York. 

6. Goldberg, D.E. (1989). Genetic Algorithms in Search, Optimization, and 
Machine Learning, Addison-Wesley Publishing Co. 

7. Holland, J.H. (1975). Adaptation in Natural and Artificial Systems, Ann Arbor: 
The University of Michigan Press. 

8. Mantelman, L. (1986). “AI Carves Inroads: Network Design, Testing, and 
Managemenf’ in Data Communications, pp. 106-123. 

9. McDermott, J. (1981). “Rl: The Formative Years,” in AI Magazine, Vol. 2, 
No. 2. 

10. Mittal, S., and F. Frayman (1989). “Towards a generic model of configuration 
tasks” in the Proceedings of the Eleventh International Joint Conference on 
Artificial Intelligence, Vol. 2, pp. 1395-1401. 




Aerial Spray Deposition Management Using the Genetie Algorithm 219 



11. Parkin, C.S., and J.C. Wyatt (1982). “The Determination of Flight-Lane 
Separations for the Aerial Application of Herbicides” in Crop Protection, 1 (3), 
pp. 309-321. 

12. Potter, W.D., J.A. Miller, B.E. Tonn, R.V. Gandham, and C.N. Lapena (1991). 
“Improving the Reliability of Heuristic Multiple Fault Diagnosis Via The 
Environmental Conditioning Operator” in the International Journal of Applied 
Intelligence, vol. 2, pp. 5-23. 

13. Potter, W.D., R. Pitts, P. Gillis, J. Young, and J. Caramadre (1992) “IDA-NET: 
An Intelligent Decision Aid for Battlefield Communication Network 
Configuration” in the Proceedings of the Eighth IEEE Conference on Artificial 
Intelligence for Applications (CAIA’92), pp. 247-253. 

14. Teske, M.E., Curbishley, T.B. (1989), “Forest Service Aerial Spray Computer 
Model FSCBG 4.0 User Manual” C.D.I. Report No. 90-06. 

15. Teske, M.E., Bowers, J.F., Rafferty, J.E., and Barry, J.W., (1993). “FSCBG: An 
Aerial Spray Dispersion Model for Predicting the Fate of Released Material 
Behind Aircraft” in Environmental Toxicology and Chemistry, Vol. 12, pp. 453- 
464. 

16. Teske, M.E. (1998) “AGDISP DOS Version 7.0 User Manual”. 

17. Teske, M. E., H. W. Thistle and B. Eav. (1998) “New Ways to Predict Aerial 
Spray Deposition and Drift” in Journal of Forestry 96(6):25-3 1 . 




Dynamic Data Mining* 



Vijay Raghavan and Alaaeldin Hafez^ 

The Center for Advaneed Computer Studies, University of Louisiana at Lafayette 
Lafayette, LA 70504, USA 
{ raghavan, ahaf ez }@cacs . louisiana . edu 



Abstract. Business information reeeived from advaneed data analysis and data 
mining is a eritieal sueeess faetor for eompanies wishing to maximize 
eompetitive advantage. The use of traditional tools and teehniques to diseover 
knowledge is ruthless and does not give the right information at the right time. 
Data mining should provide taetieal insights to support the strategie direetions. 
In this paper, we introduee a dynamie approaeh that uses knowledge diseovered 
in previous episodes. The proposed approaeh is shown to be effeetive for 
solving problems related to the effieieney of handling database updates, 
aeeuraey of data mining results, gaining more knowledge and interpretation of 
the results, and performanee. Our results do not depend on the approaeh used to 
generate itemsets. In our analysis, we have used an Apriori-like approach as a 
loeal proeedure to generate large itemsets. We prove that the Dynamic Data 
Mining algorithm is eorreet and eomplete. 



1 Introduction 

Data mining is the process of discovering potentially valuable patterns, associations, 
trends, sequences and dependencies in data [1-4,10,14,17,20,21]. Key business 
examples include web site access analysis for improvements in e-commerce 
advertising, fraud detection, screening and investigation, retail site or product 
analysis, and customer segmentation. Data mining techniques can discover 
information that many traditional business analysis and statistical techniques fail to 
deliver. Additionally, the application of data mining techniques further exploits the 
value of data warehouse by converting expensive volumes of data into valuable assets 
for future tactical and strategic business development. Management information 
systems should provide advanced capabilities that give the user the power to ask more 
sophisticated and pertinent questions. It empowers the right people by providing the 
specific information they need. 

Many knowledge discovery applications [6,8,9,12,13,15,16,18,19], such as on-line 
services and world wide web applications, require accurate mining information from 
data that changes on a regular basis. In such an environment, frequent or occasional 
updates may change the status of some rules discovered earlier. More information 



* This research was supported in part by the U.S. Department of Energy, Grant No. DE-FG02- 
97ER1220. 

^ on leave from The Department of Computer Science and Automatic Control, Faculty of 
Engineering, Alexandria University, Alexandria, Egypt 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 220-229, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Dynamic Data Mining 221 



should be collected during the data mining process to allow users to gain more 
complete knowledge of the significance or the importance of the generated data 
mining rules. 

Discovering knowledge is an expensive operation [2,4,5,6,9,10,11]. It requires 
extensive access of secondary storage that can become a bottleneck for efficient 
processing. Running data mining algorithms from scratch, each time there is a change 
in data, is obviously, not an efficient strategy. Using previously discovered 
knowledge along with new data updates to maintain discovered knowledge could 
solve many problems, that have faced data mining techniques; that is, database 
updates, accuracy of data mining results, gaining more knowledge and interpretation 
of the results, and performance. 

In this paper, we propose an approach, that dynamically updates knowledge 
obtained from the previous data mining process. Transactions over a long duration are 
divided into a set of consecutive episodes. In our approach, information gained during 
the current episode depends on the current set of transactions and the discovered 
information during the last episode. Our approach discovers current data mining rules 
by using updates that have occurred during the current episode along with the data 
mining rules that have been discovered in the previous episode. 

In section 2, a formal definition of the problem is given. The dynamic data mining 
approach is introduced in section 3. In section 4, the dynamic data mining approach is 
evaluated. The paper is summarized and concluded in section 5. 



2 Problem Definition 

Association mining that discovers dependencies among values of an attribute was 
introduced by Agrawal et al.[l] and has emerged as an important research area. The 
problem of association mining, also referred to as the market basket problem, is 
formally defined as follows. Let I = {ij,i2, • • • , in} be a set of items and 
S = {sj, S 2 , . . s^} be a set of transactions, where each transaction SfE 5* is a set of 

items that is Si ^1. An association rule denoted by X ^Y,X,Y cr/, and X nY = O, 
describes the existence of a relationship between the two itemsets X and Y. 

Several measures have been introduced to define the strength of the relationship 
between itemsets X and Y such as SUPPORT, CONFIDENCE, and 
INTEREST [\, 2,5, 1\ The definitions of these measures, from a probabilistic view 
point, are given below. 

I* SUPPORT(X ^Y) = P(X, Y) , or the percentage of transactions in the database that 

contain both X and Y. 

II* CONFIDENCE ( X ^ Y ) = P( X ,Y ) / P( X ) . or the percentage of transactions 
containing Y in those transactions containing X. 

III. INTEREST(X ^Y) = P(X,Y)/ P(X)P(Y) represents a test of statistical 
independence. 




222 Vijay Raghavan and Alaaeldin Hafez 



SUPPORT for an itemset S is calculated as SUPPORT (S )= ^ 

F 

where F(S) is the number of transactions having S, and F is the total number of 
transactions. 

For a minimum SUPPORT value MINSUP, is a large (or frequent) itemset if 



SUPPORT(S) > MINSUP, or F(S) > FmiNSUP. 



Suppose we have divided the transaction set T into two subsets Ti and T2, 
corresponding to two consecutive time intervals, where Fj is the number of 
transactions in Tj and F2 is the number of transactions in T2, {F=F i+F 2), and F i(S) is 
the number of transactions having S 'm Tj and F2(S) is the number of transactions 
having S in T2, {F(S)=F i(S)+F 2(S)). By calculating the SUPPORT of S, in each of the 
two subsets, we get 

SUPPORT, (S) = and SUPPORT, (S) = 

Fj F, 



Sis 3. large itemset if 



F,(S) + F,(S) 

Fj+F, 



> MINSUP 



or 



F,(S) + F2(S)> (Fj + fj * MINSUP 



In order to find out if is a large itemset or not, we consider four cases, 

• S is a large itemset in Tj and also a large itemset in T2, i.e., 

Fj(S)>Fj^MINSUP^rvd F,(S)>F,^ MINSUP- 

• S is a large itemset in Tj but a small itemset in T2, i.e., 

Fj(S)>Fj^ MINSUP^^^ p2(S)< F, ^MINSUP- 

• S is a small itemset in Tj but a large itemset in F2, i.e., Fj(S )<Fj ^MINSUP 

and F,(S)>F,^ min sup • 

• S is a small itemset in Tj and also a small itemset in F2, i.e., 

Fj(S)< Fj ^MINSUP and p2(S)< F2^MINSUP. 

In the first and fourth cases, is a large itemset and a small itemset in transaction 
set F, respectively, while in the second and third cases, it is not clear to determine if S 
is a small itemset or a large itemset. Formally speaking, let SUPPORT(S) = 
MINSUP + S, where S >0 if S is 3. large itemset, and S <0 if S is 3. small itemset. The 
above four cases have the following characteristics, 

• Sj >0 and 62 >0 

• dj>0 and 62 <0 

• S]<0andS2>0 

• S]<0andS2<0 
S is3. large itemset if 



F, * ( MINSUP + S,) + F,* (MINSUP + d, ) ^ ^ or 

F,+F, 




Dynamic Data Mining 223 



F, ^(MINSUP + 5^) + F, ^(MINSUP + 52)>MINSUP^(F^ +F,) 

which can be written as ^ 

Generally, let the transaction set 7 be divided into n transaction subsets Ti % 1 <i <n. 

n 

iS is a large itemset if ^ F.^S>0^ where Ff is the number of transactions in Ti and 

i=i 

Si = SUPPORTi(S) - MINSUP, 1 <i <n. -MINSUP < ^ <1-M1NSUP, 1 <i <n. 

For those cases where ^ F.^ 5. <0^ there are two options, either 

i=l 

• discard as a large itemset (a small itemset with no history record 

maintained), or 

• keep it for future calculations (a small itemset with history record 

maintained). In this case, we are not going to report it as a large itemset, but 
its ^ P ^ ^ formula will be maintained and checked through the future 

i=i 

intervals. 



3 The Dynamic Data Mining Approach 

For ^ p the two options described above could be combined into a single 

i=l 

decision rule that says discard S if 

^ F.*( MINSUP+ S. ) where 1 < a < 00 and k>l. 

^ ^ MINSUP 

i=k 

Q^= I Discard S from the set of a large itemsets (it becomes a small itemset with no 

history reeord) 

^ ^ Keep it for future ealeulations (it becomes a small itemset with a history reeord) 

The value of a determines how much history information would be carried. This 
history information along with the calculated values of locality can be used to 

• determine the significance or the importance of the generated emerged-large 

itemsets. 

• determine the significance or the importance of the generated declined-large 

itemsets. 

• generate large itemsets with less SUPPORT values without having to rerun 

the mining procedure again. 

The choice of which value of a to choose is the essence of our approach. If the value 
of a is chosen to be near the value of 1, we will have less declined-large itemsets and 
more emerged-large itemsets, and those emerged-large itemsets are more to be 
occurred near the latest interval episodes. For those cases where the value of a is 
chosen to be far from the value of 1, we will have more declined-large itemsets and 
less emerged-large itemsets, and those emerged-large itemsets are more to be large 
itemsets in the apriori-like approach. 




224 Vijay Raghavan and Alaaeldin Hafez 



In this section, we introduce the notions of declined-large itemset, emerged-large 
itemset, and locality. 



Definition 3.1: Let be a large itemset ( or a emerged-large itemset, please see 
definition 3.2) in a transaction subset Ti, I > 1 . S is called a declined-large itemset in 
transaction subset T„, n> I, if 



^ F.^(MINSUP + SJ 
MINSUP > ^ > 



MINSUP 






for all 1 < m < n, where 1 < k < m , and 1 <a<^ , 



Definition 3.2: S is called a emerged-large itemset in transaction subset T„, n> 1, 
if S was a small itemset in transaction subset T„, and F * S > 0 , or S was a 

n n ’ 

n 

declined-large itemset in transaction subset T„_i, n> 1, and ^ g >Q,k > 1 . 

i=k 



Definition 3.3: For an itemset S and a transaction subset , locality (S) is defined as 
the ratio of the total size of those transaction subsets where S is either a large itemset 
or a emerged-large itemset to the total size of transaction subsets Tf , 1 <i <n . 

Z ^ 

Vf s.t. S is a large itemset or a emerged-large itemset 
n 

i=l 

Clearly, the locality (S)=l for all large itemsets S. 



The dynamic data mining approach generates three sets of itemsets, 

• large itemsets, that satisfy the rule ^ F.^ S >0^ where n is the number of 

1=1 

intervals carried out by the dynamic data mining approach 

• declined-large itemsets, that were large at previous intervals and still 



maintaining the rule 



J F. * (MINSUP + S, ) 



MINSUP > ^ 






MINSUP 
a 



, for some value a. 



emerged-large itemsets, that were 

either small itemsets and at a transaction subset they satisfied the 
rule Sj^ >0, and still satisfy the rule ^ F.^S. >0^ 

i=k 

or they were declined-large itemsets, and at a transaction subset they 

satisfied the rule - , and still satisfy the rule ^ ^ e 

Y F.* S. > 0 ^ Y, ^ i - ^ 



Example: Let I={a,b,c,d,e,f,g,h} be a set of items, MINSUP=0.35, and T be a set of 
transactions. 




Dynamic Data Mining 225 



For a=l, 





Transactions 


count 


large or 

emerged-large 

itemsets 


count 


SUPPORT 


status 


locality 




{a,b,gM 


3 


{b} 


16 




0.43 


large itemset 


1 




{b,c,d} 


10 


{c} 


21 




0.57 


large itemset 


1 




{a,c} 


2 


{d} 


14 




0.38 


large itemset 


1 




{c,g} 


4 


{h} 


17 




0.46 


large itemset 


1 


Transaction 


{d,e,f} 


1 


{bd} 


13 




0.35 


large itemset 


1 


Subset T 1 


{e,g,h} 


4 
















{a,b,d} 


2 
















{b,d,fi 


1 
















{df,h} 


5 
















{c,h} 


5 
















{c,h} 


12 


{b} 


25 




0.35 


large itemset 


1 




{b,d,g} 


8 


{0} 


43 




0.60 


large itemset 


1 


Transaction 


{a,c} 


9 


{h} 


33 




0.46 


large itemset 


1 


Subset T2 


{b,c} 


1 


{ch} 


12 




0.35 


emerged-large itemset 






{g.h} 


4 
















{a} 


10 


{a} 


19 




0.39 


emerged-large itemset 


0.41 




{a,b,g,h} 


5 


{b} 


43 




0.36 


large itemset 


1 




{b,c,d} 


10 


{c} 


64 




0.53 


large itemset 


1 




{a,c} 


2 


{h} 


52 




0.43 


large itemset 


1 


Transaction 


{c,g} 


4 














Subset Ts 


{d,e,fj 


1 
















{e,gh} 


4 
















{a,b,d} 


2 
















{b,d,fj 


1 
















{d,f,h} 


5 
















(cM 


5 














^or a=2, 




Transactions 


count 


large or 


count 


SUPPORT 


Status 


Locality 








emerged-large 


















itemsets 














{a,b,g,h} 


3 


{b} 


16 


0.43 




Large itemset 


1 




{b,c,d} 


10 


{0} 


21 


0.57 




Large itemset 


1 




{a,c} 


2 


{d} 


14 


0.38 




Large itemset 


1 




{c,g} 


4 


{h} 


17 


0.46 




Large itemset 


1 


Transaetion 


{d,e,f} 


1 


{bd} 


13 


0.35 




Large itemset 


1 


Subset T 1 


{e,g,h} 


4 
















{a,b,d} 


2 
















{bM 


1 
















{d,m 


5 
















{c.h} 


5 
















{c,h} 


12 


{b} 


25 


0.35 




large itemset 


1 




{b,d,g} 


8 


{c} 


43 


0.60 




large itemset 


1 


Transaction 


{a,c} 


9 


{d} 


22 


0.31 




declined-large itemset 


0.52 


Subset T2 


{b,c} 


1 


{g} 


12 


0.35 




emerged-large itemset 


0.48 




{g.h} 


4 


{h} 


33 


0.46 




large itemset 


1 








{bd} 


18 


0.25 




declined-large itemset 


0.52 








{ch} 


12 


0.35 




emerged-large itemset 


0.48 




{a} 


10 


{a} 


19 


0.39 




emerged-large itemset 


0.41 




{a,b,g,h} 


5 


{b} 


43 


0.36 




large itemset 


1 




{b,c,dj 


10 


{c} 


64 


0.53 




large itemset 


1 




{a,c} 


2 


{d} 


36 


0.3 




declined-large itemset 


0.31 


Transaction 


{c.gj 


4 


{g} 


25 


0.30 




declined-large itemset 


0.28 


Subset T3 


{d.e,fj 


1 


{h} 


52 


0.43 




large itemset 


1 




{e,g,h} 


4 


{bd} 


31 


0.26 




declined-large itemset 


0.31 




{a.b.d} 


2 


{ch} 


17 


0.20 




declined-large itemset 


0.28 




{b.d,fi 


1 
















{d.m 


5 
















{c.h} 


5 















When applying an Apriori-like Algorithm on the whole file, the resulting large 
itemsets are 



large itemsets 


count 


SUPPORT 


{b} 


43 


0.39 


M 


64 


0.58 


m 


52 


0.47 




226 Vijay Raghavan and Alaaeldin Hafez 



By comparing the results in the previous example, we can come with some intuitions 
about the proposed approach, which can by summarized as, 

• The set of large itemsets and emerged-large itemsets generated by our 
Dynamic approach is a superset of the set of large itemsets generated by 
the Apriori-like approach. 

• If there is an items et generated by our Dynamic approach but not generated 

by the Apriori-like approach as a large itemset, then this itemset should be 
large at the latest consecutive time intervals, i.e., a emerged-large itemset. 

In lemmas 3.1 and 3.2, we proves the above intuitions. 

lemma 3.1: For a transaction set T, the set of large itemsets and emerged-large 
itemsets generated by our Dynamic approach is a superset of the set of large itemsets 
generated by the Apriori-like approach. 

proof: Let KJiTi=T, 1<I <n, Fi=\Ti\ and be a large itemset that is generated by the 
Apriori-like approach, i.e., ^ F.^ S > O’’ Dynamic approach. There 

i=l 

two cases to consider. 

Case 1 ( (X=l) 

For a transaction subset Tj, , 1 < k < n, S is discarded from the set of a large 
itemsets, if it becomes a small itemset, i.e., ^ ^ 1 <m <k, and no history 

i=m 

m—1 

is recorded. Since no history is recorded before m, that means ^ F.^S. <0- 

i=l 

k n 

leads to V F <0- k=n, we have V F^5 <0-> which contradicts our 

i=l i=l 

assumption. 

Case 2: o£>l 

For a transaction subset , 7 < ^ is discarded from the set of a large 

k 

itemsets, if it becomes a small itemset, i.e., ^ < 0 ? ^ ^ - k and 

i=m 

depending on the value of a, its history is started to be recorded in transaction 

m-l 

subset Tm. Since no history is recorded before m, that means ^ F.^SkO- That 

i=i 

k n 

leads to V 77 * ^ . For k=n, we have V FF^ S. <0-> which contradicts our 

i i i i 

i=l i=l 

assumption. 



lemma 3.2: If there is an itemset generated by our Dynamic approach but not 
generated by the Apriori-like approach as a large itemset, then this itemset should be 
large at the latest consecutive time intervals, i.e., a emerged-large itemset. 
proof: By following the proof of lemma 3.1, the proof is straight forward. 




Dynamic Data Mining 227 



Algorithm DynamicMining (TJ 

fi (^n ) ^ ^ emerged - I arg e itemsets . 

fi (^n ) declined - I arg e itemsets . 



px 

is the accumulate d value of Sf A is the accumulate d value of F. . 

Cl ^ is the. accumulate d value of F. where itemset x is large 

begin 

A = A + F. 

fj ('^)={(xPD ,d =F„ \x^ f/PjAx^fdJAf *(| >0fj{(vCl),Cl =Cl +/; I F +i; >0 } emerged-large itemset 

, A * MINSUP + r "" MINSUP //was large itemset 

f (T„) = { (XCF)\MINSUP > > } 

A a 

for (k=2;fk.i(T„)^(/);k++) do 

begin 

Ck=AprioriGen (fk-i (Tn) u f-i *(T„)) 
forall transactions t E T 

forall candidates ceCkdo 

if c^t then c.count++ 

f^ (TJ={(X,CIJ,XE c,Cl =F„ \x^fffjAx^fl(fjAF„ >0 }u{(xPJ,xe c,Cl^ =Cl^ +F„ | F" +F„ ^S„ >0 } 



fk(T n) = { Cl p\ MINSUP 

end 

return fk(T„) andfk*(T„) 

end 



> 



Zl * MINSUP 
A~ 



+ F " ^ MINSUP 
a 



fun ction AprioriGen (f ^_j) 
insert into Ck 
select I ], 1 2, ■ ■ -dk-hCk-i 
fromfk.] l,fk-ic 

where 1 ]=cja 1 2=02 a ... a h-i-Ck-iA lk-i<Ck-i 
delete all items ceCkSuch that (k-l)-subsets of c are not in fk-i(Tn) 

return Ck 



lemma 3.3: The Dynamic Data Mining approach is correct. 
proof: (See lemmas 3.1 and 3.2) 



4 Analysis and Performance Study 

In the DynamicMining algorithm, we used an Apriori-like approach as a local 
procedure to generate large or emerged-large itemsets. We would like to emphasize 
the fact that our approach does not depend on the approach used to generate itemsets. 
The main contribution of our approach, is to dynamically generate large itemsets 
using only the transaction updates and the information collected in the previous data 
mining episode. 

Assuming that an Apriori-like procedure is used as a local procedure, the total 
number of disk accesses needed for performing the DynamicMining algorithm is 
^ K.N blocks) of the transaction subset Ti , 1 <i <n, 

i=l 

and Ki is the length of longest large itemset. On the other hand, the total number of 
disk accesses needed for performing an Apriori-like algorithm, which is carried each 
time on the whole transaction file is 




228 Vijay Raghavan and Alaaeldin Hafez 



Z Z diskblocksof T) 

i=l j=l i=l 

In our preliminary experimental results, the Dynamic Mining algorithm has 
shown a significant potential usage. Four main factors have been considered in our 
study, namely, 

• The performance of the Dynamic Mining algorithm in terms of disk access, and CPU time. 

• The knowledge gained by using different values of a. 

• The effect of the locality values on the knowledge discovered through the data mining process. 

• The generation of the emerged-large itemsets and declined-large itemsets and the significance 
of having this information. 



5 Conclusions and Future Work 

In this paper, we have introduced a Dynamic Data Mining approach. The proposed 
approach performs periodically the data mining process on data updates during a 
current episode and uses that knowledge captured in the previous episode to produce 
data mining rules. We have introduced the concept of locality along with the 
definitions of emerged-large itemsets and declined-large itemsets. The new approach 
solves some of the problems that current data mining techniques suffer from, such as, 
database updates, accuracy of data mining results, gaining more knowledge and 
interpretation of the results, and performance. 

We have discussed the Dynamic Data Mining approach. In our approach, we 
dynamically update knowledge obtained from the previous data mining process. 
Transactions domain is treated as a set of consecutive episodes. In our approach, 
information gained during a current episode depends on the current set of transactions 
and that discovered information during the previous episode. In our preliminary 
experimental results, the Dynamic Mining algorithm has shown a significant potential 
usage. We have discussed the efficiency of the Dynamic Mining algorithm in terms of 
disk accesses. Also, we have shown the significance of the knowledge discovered by 
using different values of a, and the effect of the locality values along with the 
generation of the emerged-large itemsets and declined-large itemsets on that 
knowledge. Finally, we have proved that the Dynamic Data Mining algorithm is 
correct. 

As a future work, the Dynamic approach will be tested with different datasets that 
cover a large spectrum of different data mining applications, such as, web site access 
analysis for improvements in e-commerce advertising, fraud detection, screening and 
investigation, retail site or product analysis, and customer segmentation. 



References 

[1] R. Agrawal, T. Imilienski, and A. Swami, "Mining Association Rules between Sets of 
Items in Large Databases," Proe. of the ACM SIGMOD Int'l Conf On Management of 
data, May 1993. 




Dynamic Data Mining 229 



R. Agrawal, and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc. Of 
the 20 th VLDB Conferenee, Santiago, Chile, 1994. 

R. Agrawal, J. Shafer, "Parallel Mining of Assoeiation Rules," IEEE Transaetions on 
Knowledge and Data Engineering, Vol. 8, No. 6, Dee. 1996. 

C. Agrawal, and P. Yu, "Mining Large Itemsets for Assoeiation Rules," Bulletin of the 
IEEE Computer Soeiety Teehnieal Committee on Data Engineering, 1997. 

S. Brin, R. Motwani, et al, "Dynamie Itemset Counting and Implieation Rules for Market 
Basket Data," SIGMOD Reeord (SCM Speeial Interset Group on Management of Data), 
26,2, 1997. 

S. Chaudhuri, "Data Mining and Database Systems: Where is the Interseetion," Bulletin 
of the IEEE Computer Soeiety Teehnieal Committee on Data Engineering, 1997. 

M. Chen, J. Han, and P. Yu, "Data Mining: An Overview from a Database Prospeetive", 
IEEE Trans. Knowledge and Data Engineering, 8, 1996. 

M. Chen, J. Park, and P. YU, "Data Mining for Path Traversal Patterns in a Web 
Environment", Proe. 16^^ Untl. Conf Distributed Computing Systems, May 1996. 

D. Cheung, J. Han, et al, " Maintenanee of Diseovered Assoeiation Rules in Large 
Databases: An Ineremental Updating Teehnique", In Proe. 12^^ Inti. Conf On Data 
Engineering, New Orleans, Louisiana, 1996. 

U. Fayyed, G. Shapiro, et al, "Advanees in Knowledge Diseovery and Data Mining", 
AAAI/MIT Press, 1996. 

A. Hafez, J. Deogun, and V. Raghavan ,"The Item-Set Tree: A Data Strueture for Data 
Mining", DaWaK' 99 Conferenee, Firenze, Italy, Aug. 1999. 

C. Kurzke, M. Galle, and M. Bathelt, "WebAssist: a user profile speeifie information 
retrieval assistant," Seventh International World Wide Web Conferenee, Brisbone, 
Australia, April 1998. 

M. Langheinriehl, A. Nakamura, et al ,"Un-intrusive Customization Teehniques for Web 
Advertising," The Eighth International World Wide Web Conferenee, Toronto, Canada, 
May 1999 

H. Mannila, H. Toivonen, and A. Verkamo, "Effieient Algorithms for Diseovering 
Assoeiation Rules," AAAI Workshop on Knowledge Diseovery in databases (KDD-94) , 
July 1994. 

M. Perkowitz and O. Etzioni, "Adaptive Sites: Automatieally Learning from User Aeeess 
Patterns", In Proe. 6^^ Int. World Wide Web Conf, santa Clara, California, April 1997. 

P. Pitkow, "In Seareh of Reliable Usage Data on the WWW", In Proe. 6* Int. World 
Wide Web Conf, santa Clara, California, April 1997. 

G. Rossi, D. Sehwabe, and F. Lyardet, "Improving Web Information Systems with 
Navigational Patterns," The Eighth International World Wide Web Conferenee, Toronto, 
Canada, May 1999 

N. Serbedzija, "The Web Supereomputing Environment," Seventh International World 
Wide Web Conferenee, Brisbone, Australia, April 1998. 

T. Sullivan, "Reading Reader Reaetion: A Proposal for Inferential Analysis of Web 
Server Log Files", In Proe. 3^^ Conf Human Faetors & The Web, Denver, Colorado, June 
1997. 

C. Wills, and M. Mikhailov, "Towards a Better Understanding of Web Resourees and 
Server Responses for Improved Caehing," The Eighth International World Wide Web 
Conferenee, Toronto, Canada, May 1999 

M. Zaki, S. Parthasarathy, et al, " New Algorithms for Fast Diseovery of Assoeiation 
Rules," Proe. Of the 3 rd Infl Conf On Knowledge Diseovery and data Mining (KDD- 
97), AAAI Press, 1997. 




Knowledge-Intensive Gathering and Integration of 
Statistical Information on European Fisheries 



Mike Klinkert*’^, Jan Treur^, and Tim Verwaart* 

* Agricultural Economics Research Institute LEI 
Burgemeester Patijnlaan 19, 2585 BE Den Haag, The Netherlands 
d . verwaart@lei . dlo . nl 
http : / /www .lei.nl 

^ Vrije Universiteit Amsterdam, Department of Artifieial Intelligenee 

De Boelelaan 1081a, 1081 HV Amsterdam, The Netherlands 
treur@cs . vu . nl 
http : / /www . cs . vu . nl/ -treur 



Abstract. Gathering, maintenanee, integration and presentation of statisties are 
major aetivities of the Duteh Agrieultural Eeonomies Researeh Institute EEL In 
this paper we explore how knowledge and agent teehnology ean be exploited to 
support the information gathering and integration proeess. In partieular, the 
methods used by the institute’s experts in the domain of Fishery Eeonomies are 
analyzed. Also, the design and implementation of a prototype system for 
intelligent information gathering and integration are diseussed. The model 
eovers not only the aetual integration, but also the proeesses for identifieation 
of the type of information needed by the elient, and the gathering of souree 
information. 



1 Introduction 

In the Netherlands, the Agricultural Economics Research Institute LEI is the central 
organization for socio-economic research in agriculture, horticulture, fisheries, 
forestry and rural areas. Collection, maintenance, integration and presentation of 
statistics originating from different sources are major activities of LEI. Usually 
experts spend a lot of effort to create consistent data sets. We explored how 
knowledge and agent technology can be exploited to support the information 
gathering and integration processes. In particular, the methods used by the experts in 
the domain of Fishery Economics statistics were analyzed. Also, a process model and 
a prototype system for intelligent information gathering and integration were designed 
and implemented. The model covers actual integration as well as processes like 
identification of the type of information needed (the goal), and the subsequent 
selection of sources and gathering of information from these sources (planning). 

The European Commission and national and regional authorities require data in 
order to direct efforts, and to monitor results of fishery policy. In this area of common 
policy there are insufficient internationally coordinated statistics systems to support 
this need. Increasingly more efforts are made to ensure that reports on economic data 
are created on a European level, in stead of a national level exclusively. In various 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 230-235, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Knowledge-Intensive Gathering and Integration of Statistieal Information 23 1 



areas, concerted actions exist with the common goal to even the levels of expertise. 
The official name for the concerted action on fishery economics is ‘Concerted 
Action - Co-ordination of Research in Fishery Economics (AIR CT94 1489) \ Fifteen 
countries participate in this project. The information integration process is executed in 
two yearly workshops with fishery experts from all participating countries. A plan for 
gathering data is made in spring. In fall, the data are actually integrated. 

Dol et al. [3], identified the data integration problems related with this 
concerted action. Among these are: 

• The various research institutes often receive data that are already in some way 
processed, without knowing exactly how this processing was done. 

• Not all countries have much experience with collecting fishery data. For some 
countries this became only an issue since they joined the European Union. 

• It is said that some governments are not completely honest. However, since those 
governments will never confirm this, this remains hearsay. 

• Many results are based on samples. A sample size may be too small, which means 
that the results after extrapolation are unlikely to reflect the real situation. 

In Section 2, the Fishery Economics domain is discussed in more detail; in the 
Appendix some realistic example tables taken from this domain are shown. Section 3 
introduces the process model and prototype system designed. In Section 4 the results 
are briefly discussed. 

2 Fishery Economics Information from Different Sources 

To acquire knowledge of the domain and the methods used, two experts were 
interviewed. They were confronted with raw data and the concerted action report [4] 
for The Netherlands and Belgium in 1993, and were asked to explain how the former 
was transformed into the latter. The report contains economic fishery data for a 
particular year or a couple of years. Some details of this report are briefly discussed 
in section 2.1. An example of differences in raw data is given in section 2.2. 

2.1 Concerted Action Report; Different Types of Tables 

The concerted action report consists primarily of tables and a (brief) rationale that 
tries to explain some of the tendencies that can be seen in the tables containing time 
series. The beam trawl fleet is the most important for the concerted action. Most tables 
give only information of the beam trawl fleet. The parameters are split into values for 
Belgium, The Netherlands, United Kingdom and the total for the EU (which is the 
sum of the three values of the previous three countries). Table 1 shows an example. 

2.2 Differences between the Information Sources 

The information presented in this section is, as mentioned earlier, derived from the 
information found in the spreadsheets. It turned out that the models for The 
Netherlands and Belgium were partially dissimilar. One of the discrepancies is in the 
way repair of hull and engine is treated. 




232 



Mike Klinkert et al. 



Table 1. Review of the North Sea beam trawl fleet, 1995. 





Total EU 


Fleet segments 






B 


NL 


UK 


Economic indicators | 


Value of landings (mECU) 


370 


64 


249 


57 


Gross value added (mECU) 


187 


34 


134 


19 


Gross cash flow (mECU) 


78 


13 


62 


3 


Net (financial) result 


-14 


-2 


1 


-13 


Employment on board (FTEs) 


2550 


400 


1544 


606 


Invested capital (mECU) 


816 


165 


521 


130 


Effort (min kW-days) 


97 


13 


60 


24 


Capacity indicators | 


Volume of landings 


146 


25 


87 


34 


Fleet - number of vessels 


452 


79 


253 


120 


Fleet -total GRT(IOOO) 










Fleet- total kW (1000) 


452 


45 


317 


90 



The costing scheme for The Netherlands is: 

- Fixed costs: 

• vessel costs 

■ maintenance / repairs of hull 

■ maintenance / repairs of engines 

■ insurance 

■ 0.5 * navigation 

■ 0.5 * costs of administration, etc. 

- Variable costs: 

• fuel costs 

• other running costs 

■ deck equipment 

■ fishing gear 

■ catch conservation 

■ crew travelling costs 

■ landing costs 

■ 0.5 * navigation 

■ 0.5 * costs of administration, etc. 

• crew costs 

■ crew shares 

■ provisions 

■ social security costs 

■ special allowances 

The discrepancies between The Netherlands and Belgium are: 

- Fixed costs: 

• vessel costs 

■ 0.5 * maintenance / repairs of hull 

■ 0.5 * maintenance / repairs of engine 





Knowledge-Intensive Gathering and Integration of Statistieal Information 233 



- Variable costs: 

• other running costs 

■ 0.5 * maintenance / repairs of hull 

■ 0.5 * maintenance / repairs of engine 

- Navigation is not part of the Belgian specification. 



3 The Process Model 

Based on the knowledge acquisition with the experts a process model was designed 
that describes the data integration. The process model and prototype system have been 
designed and implemented using the compositional development method DESIRE [2]. 
This knowledge-level method allows for conceptual and formal support in modeling, 
modification and reuse. 



data info to user 




data info to gk 



Figure 1. Top level of proeess eomposition. 

3.1 Top Level Processes 

The top level of the process model is depicted in Figure 1. The component User 
represents the interface to the user. The component Goal Knowledge maintains the 
knowledge used in the other components. The components Goal Identification and 
Goal Realization do the actual work. In Goal Identification the overall goal is 
determined (which type of information is required), and by refining this goal the more 
specific subgoals are determined (how can the required type of information be 
composed of more specific types of information). 








234 Mike Klinkert et al. 



3.2 Overview of all Process Abstraction Levels 

The process model has a more refined structure than described above. An overview of 
all process abstraction levels is depicted in Figure 2. For example, Goal 
Determination is composed of Goal Generation and Goal Selection. 



I— user 



T— get_question 
show_output 



^goal_knowledge 






taxonomy 
sources 
data 



— data_nl i— aanvoer_en_besomming 

— data_be q uitkomsten 

_data_common ander 

_ data_concerted_action 



toplevel 



r goal_determination goal jgeneration 

goal_selection 



|— goaljdentification— |— subgoal_determination |^subgoal_generation 

subgoal_selection 



*— subgoal_evaluation 



■goal_realization' 



plan_determination 



plan_execution 



-T 



source_determi nation 
plan_generation 

select_step 
exectute_step 



plan_evaluation 



~ link_data 
retrieve_data 
— extrapolate_data 
^ correct_data 



Figure 2. Overview of proeess abstraetion levels. 



4 Discussion 

Visser et al. [6] distinguish four kinds of heterogeneity of information: 

• paradigm heterogeneity , 

• language heterogeneity , 

• ontology heterogeneity , 

• content heterogeneity. 

In our approach, within the above spectrum the emphasis is on ontology 
heterogeneity. The contribution of the work reported here goes beyond the technical 
treatment of heterogeneity. It extends to the specification of (expert) methods for 
overall goal-directed and planning from goal identification, source determination, and 
information gathering, to actual integration and presentation of information. 

The focus of most research on planning and source selection for information 
integration is on efficiency of the data collection process. Much has been achieved in 
this field, e.g. [1]. In our research, we concentrate on comparing sources and handling 
inconsistencies, and on methods for the integration of heterogeneous statistics. 
Section 3.2 illustrates that integration of statistics is more complex than semantic 




Knowledge-Intensive Gathering and Integration of Statistieal Information 235 



mapping, as practiced in systems like SIMS [5], partly because of differences in the 
composition of variables with apparently identical definitions. 

The work reported here entails an analysis of methods used by fisheries 
economists; some of the findings of this analysis are: 

• The experts could not directly reproduce the method for integration. However, 
they managed to reconstruct it by examining the spreadsheet and the report. 

• Some values in the report tables did not correspond with the values calculated in 
the spreadsheet. Most errors were small, some were presumably data entry errors. 

• Some variables appear in multiple tables. Some had different values, where they 
should be equal, probably also due to data entry errors. 

• Some totals in the tables did not exactly reflect the detail figures. This could be 
due to data entry errors or to rounding off 

The conclusion from these findings may be that the information integration is 
currently not a properly functioning process. Under time pressure, errors seem to be 
made by the experts in the workshops. Furthermore, the argumentation for taking 
decisions apparently cannot be reproduced. On the basis of these observations it is 
worthwhile to investigate to what extent quality of this information integration process 
can be improved by applying artificial intelligence. 



References 

1. Ambite, J.L. and Knoblock, C.A., Flexible and Sealable Planning in Distributed and 
Heterogeneous Environments. In: Proeeedings of the Fourth International Conferenee on 
AI Planning Systems, 1998. 

2. Brazier, F.M.T., Jonker, C.M., and Treur, J., Prineiples of Compositional Multi-agent 
System Development. In: J. Cuena (ed.), Proeeedings of the 15th IFIP World Computer 
Congress, WCC98, Conferenee on Information Teehnology and Knowledge Systems, 
IT&KNOWS'98, 1998, pp. 347-360. To be published by lOS Press. 

3. Dol, W., Einhaus, N.F., Hennen, W.H.G.J., Salz, P., Verwaart, D., Analytieal Fisheries 
Eeonomies Database. LEI, The Hague, final report of definition study for the European 
Commission, 1996. 

4. Eeonomie Performanee of Seleeted Fleet Segments in the EU 1996/2 Report, working 
doeument nr. 10, January 1997, Coneerted Aetion h Co-ordination of Researeh in Fishery 
Eeonomies (AIR CT94 1489). 

5. Knobloek, C.A. and J.L. Ambite, Agents for Information Gathering. In: Bradshaw, J. 
Software Agents. AAAI/MIT Press, 1997. 

6. Visser, P.R.S., Jones, D.M., Beneh-Capon, T.J.M., and Shave, M.J.R., An 
analysis of ontology mismatehes; heterogeneity versus interoperability. AAAI 1997 Spring 
Symposium on Ontology Engineering, 1997 . 




Using a Semantic Model and XML for 
Document Annotation 



Bogdan D. Czejdo and Cezary Sobaniec 

Department of Mathematies and Computer Seienee, Loyola University 
6363 St. Charles Ave., New Orleans, LA 70118 
Tel . : (504) 865-3340 

{czejdo, csobanie }@loyno . edu 



Abstract. In this paper we diseuss the use of a semantie model to deseribe the 
key terms of a text and the relationships between them. We show how this 
model ean be represented by annotations of the existing text doeuments using 
XML approaeh. As a result, a new generation of seareh engines ean be ereated 
that allows Internet users to find doeuments that satisfy struetural requirements 
speeified by the semantie model. 



1 Introduction 

Many documents in the area of science and engineering that are published on the 
WEB have a content that can be described by a semantic model [1, 2, 3, 4]. Such a 
model can be a good basis for a structural index that would allow Internet users to 
obtain more satisfactory search results. In this paper we describe a semantic model 
called the Term-Relationship (TR) model. We show how this model can be repre- 
sented by annotations of the existing text documents using XML approach. 



2 Term-Relationship Model 



For knowledge representation contained in science and engineering documents we use 
the Term-Relationship (TR) model. The TR model views the world as consisting of 
terms and relationships among them. In the TR model there are two different types of 
terms, SIMPLE and COLLECTION. A term is a SIMPLE term if it corresponds to a 
class containing single objects. It is usually represented by a noun in a singular form. 
A term is a COLLECTION term if it corresponds to a class containing a collection of 
objects. It is usually represented by a noun in a plural form. For example, in Figure 1, 
Computer System is SIMPLE term because it corresponds to a class of single objects. 
If the term Computer Systems is used, then it would be the COLLECTION term be- 
cause it corresponds to a class for collection of objects. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 236-241, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Using a Semantic Model and XML for Document Annotation 237 



A relationship is a meaningful connection among terms. In the TR model there is 
unlimited number of types of relationships, however there are four types that are used 
most commonly. These types are CONSISTSOF, HASSUBTYPE, 
HAS ELEMENT, and HAS INSTANCE. The relationships can be n-ary but binary 
relationships are most common. 

The formal model for TR is a pair (TD, RD) where TD is a set of term descriptors; 
RD is a set of relationship descriptors. The term descriptor is a pair (name, term 
type), where term type is described above. The relationship descriptor is a 4-tuple 
(name, reltype, relleft, relright), where reltype is a relationship type de- 
scribed above. To reflect the direction of the relationship a unique identifier for the 
starting term descriptor is explicitly used as (relleft) together with a set of identifi- 
ers for the ending term descriptors (relright). 

The formal model can be extended to include a set of role descriptors and a set of 
participation descriptors [2]. For simplicity of the presentation we will omit these two 
component here. Let us consider a fragment of an engineering text from [5] after 
some simplifications. 

A computer system has two components: hardware and software. The 
equipment associated with a computer system is called hardware. A set of 
instructions called software tells the hardware what to do. 

Hardware consists of input devices, the processor, output devices and 
storage. 

Fig. 1. Simple text fragment 

Figure 2 shows a fragment of TR model for the text presented in Figure 1. 

TD = { (Computer System, Single) 

(Hardware, Collection) 

(Software, Collection) . . . } 

RD = { (ConsistsOf , Consistsof, Computer System, 
{Hardware, Software}).} 

Fig. 2. Part of a TR model for a simple text fragment 
The TR model can be represented graphically what is shown in Figure 3. 



3 Semantic Model Annotation Language 

3.1 XML 

XML [5] is a language specially designed for WEB technology. XML is a subset of 
SGML and defines device- and system-independent methods of representing texts in 
electronic form. Actually XML is a metalanguage, since it allows to formally de- 
scribe markup languages. Markup language is a set of markup conventions used to- 
gether for encoding texts. A markup language specifies what markups are allowed, 
what markups are required, and how markups are distinguished from text. Markup 
language document consists of text and markups. Markups describe annotations in- 
structing software of special formatting, or any other special processing. 




238 Bogdan D. Czejdo and Cezary Sobaniec 




Fig. 3. A graphical representation of the TR model for simple text fragment 



The basic concept of XML is that documents are composed of a series of entities 
(objects). Each entity can contain one or more logical elements. Each element can 
have certain attributes that further describe the properties of the element. XML pro- 
vides a formal syntax for describing relationships between entities, elements and 
attributes that make up a markup language. 



3.2 Text Annotation 



In order to include semantic information within textual documents, we have devel- 
oped our own markup language called Semantic Model Annotation Language 
(SMAL). Figure 4 presents an example of document file with SMAL annotations. 

<?xml version= " 1 . 0 " encoding= "us-ascii " ? > 

<!DOCTYPE smaldoc SYSTEM "smal.dtd"> 

<smaldoc> 

A <termref tid= " 1 " >computer system</termref > <relref rid="l"> 
has</relref> two components: <termref tid= " 2 " >hardware 
</termref> and <termref tid= " 3 " >sof tware</termref > . The 
equipment associated with a <termref tid= " 1 " >computer system 
</termref> is called <termref tid="2"> hardware</termref > . 

A set of instructions called <termref tid="3"> software 
</termref> tells the <termref tid="2 ">hardware</termref > what 
to do . 

<termref tid="2 " >Hardware</termref > <relref rid= " 2 " >consists 
of</relref> <termref tid= " 5 " >input devices</termref > , the 
<termref tid="4"> processor</termref > , <termref tid="6"> 
output devices</termref > and <termref tid="7"> secondary 
storage</termref > . 

<term tid="l"> 

<termname>computer system< / termname> 

</term> 

<term tid="2" type= " collection" > 

< t e rmname > ha r dwa r e < / 1 e rmname > 

< / term> 










Using a Semantic Model and XML for Document Annotation 239 



<term tid="3" type= " collection" > 
<termname>sof tware< / termname> 

</term> 

<term tid="4"> 

<termname>processor< / termname> 

< / term> 

<term tid="5"> 

<termname> input device</termname> 
</term> 

<term tid="6"> 

<termname>output device< / termname> 

< / term> 

<term tid="7"> 

<termname> secondary storage</termname> 
</term> 

<rel rid="l"> 

<reltype>consistsof </reltype> 

<relleft tid="l" /> 

<relright tid="2" /> 

<relright tid="3" /> 

< / rel> 

<rel rid="2"> 

<reltype>consistsof </reltype> 

<relleft tid="2" /> 

<relright tid="5" /> 

<relright tid="4" /> 

<relright tid="6" /> 

<relright tid="7" /> 

< / rel> 

</smaldoc> 



Fig. 4. Simple text fragment with annotations 

First two lines comprise the preamble of XML document. They indicate that this is an 
XML document, and that the formal description of markup language (DTD) is stored 
in file smal.dtd. The whole content of the document is embraced within a 
<smaldoc> element, which represents a document with semantic model annotations. 
Within the content of the document, we have used elements for identifying important 
for us terms along with relationships between them. The following section formally 
describes proposed language. 



3.3 Definition of the Semantic Model Annotation Language 

In XML definition of a new markup language is given in the form of Document Type 
Definition (DTD). DTD is used to verify consistence between documents and their 
specifications. DTD is flexible enough to describe any logical text structure: a form, 
letter, report, book, encyclopedia, dictionary or database. 

The method of describing structure of documents in XML is similar to other lan- 
guage specification methods, such as BNF. There are two types of statements within 
DTD specifications. The first type is specification of language elements. The prefix 
notation is used instead of traditional infix notation. For example the statement: 




240 Bogdan D. Czejdo and Cezary Sobaniec 



<! ELEMENT rel (reltype, relleft, relright+) > 

means that the language element rel consists of a sequence of elements: reltype, 
rel left, and rel right. The rel right element may be repeated one or more 
times. 

The second type of DTD statements defines attributes for specific language ele- 
ments using again the prefix notation. For example the statement: 

<!ATTLIST relief t 

tid CDATA #REQUIRED> 

means that the language element relief t has an attribute tid, which is of the type 
character data (cdata). A value of that element must be provided in a document. 
Figure 4 presents the Document Type Definition of our Semantic Model Annotation 
Language. 

1: <?xml version= " 1 . 0 " encoding= "us-ascii " ?> 

2 : 



3 : 


< 1 ELEMENT 


smaldoc (#PCDATA term termref rel relref ) *> 


5: 


< ! ELEMENT 


term (termname) > 


6 : 


< lATTLIST 


term 


7 : 




type (simple collection) "simple" 


8: 
Q • 




tid CDATA #REQUIRED> 


10: 


< ! ELEMENT 


termname (# PCDATA) > 


12 : 






13 : 


< ! ELEMENT 


termref (# PCDATA) > 


14 : 


< lATTLIST 


termref 


15: 




tid CDATA #REQUIRED> 


16 : 






17 : 


< ! ELEMENT 


rel (reltype, relleft, relright+) > 


18: 


< lATTLIST 


rel 


19: 




rid CDATA #REQUIRED> 


20: 






21 : 


< ! ELEMENT 


reltype (# PCDATA) > 


22 : 






23 : 


< ! ELEMENT 


relleft EMPTY> 


24 : 


< lATTLIST 


relleft 


25: 




tid CDATA #REQUIRED> 


26 : 






27 : 


< ! ELEMENT 


relright EMPTY> 


28: 


< lATTLIST 


relright 


29: 




tid CDATA #REQUIRED> 


30: 






31: 


< ! ELEMENT 


relref (# PCDATA ) > 


32 : 


< lATTLIST 


relref 


31: 




rid CDATA #REQUIRED> 



Fig. 5. Data Type Definition of SMAL 



The whole document should be embraced with <smaldoc> tags (opening 
<smaldoc> and closing </smaldoc> tags). The document can contain any text, and 
important terms along with relationships between them, are distinguished using the 
elements <termref > and <relref > (line 3). <termref > tag marks appearances of a 
term that we are interested in. Terms are described using <term> element (line 5), 
and identified by tid attribute (line 8). Terms are of specific type, which is indicated 
by the type attribute (line 7). By default, all terms are simple terms. There are ref- 




Using a Semantic Model and XML for Document Annotation 241 



erences to terms within the text, marked by <termref> element, tid attribute of 
<termref > element points to a specific term description (line 15). 

Relationships between terms are described by <rel> element. <rel> element con- 
sists of other three elements: <reltype>, <relleft>, and <relright> (line 17). 
Relationships are also uniquely identified by a similar attribute rid (line 19). 
<reltype> element describes the type of the relationship (line 21). <relleft> and 
<relright> elements contain references to terms that are in relation. These elements 
have tid attribute that refers to tid attribute of <term> element (lines 25 and 29). 
There must be exactly one <relleft> element within the <rel> element, and at 
least one <relright> element (line 17). Therefore it is possible to represent binary 
relations, as well as n-ary relations. Within the text, there can be references to par- 
ticular relationships, which are indicated by <relref > elements (line 31). rid attrib- 
ute of <relref > element points to an existing relationship identified with a certain 
value of rid attribute of <rel> element (line 31). All identifiers are required. 



4 Summary 

In this paper we described the use of a semantic model to define key terms and the 
relationships between them of the text documents in the area of science and engi- 
neering published on the WEB. We discussed how this model could be represented by 
annotations of the existing text documents using XML approach. The new documents 
can also be annotated using the same approach. In both cases special annotation tools 
could be used. 

We expect that the result of our research would be a good basis for a new genera- 
tion of WEB search engines and/or a new generation of WEB browsers. The use of 
annotations based on semantic model can make the search more precise and better 
reflecting users requests. It will be especially applicable for the documents in the area 
of engineering or science where relationships between concepts can be more precisely 
identified. 



References 

1. Capron, H. L., Computers - Tools for an Information Age, Addison- Wesley 
Longman Publishing Company, 1998. 

2. Embley, D., Kurtz, B., Woodfield, S., “Object-Oriented Systems Analysis - a 
model driven ap-proach”, Prentice Hall, New Jersey, 1992. 

3. Rumbaugh, J., Blaha, M., Premerlani, W., Eddy, F., Lorensen, W., Object- 
Oriented Modeling and Design, Prentice Hall, New Jersey, 1991. 

4. Salton, G., McGill, M., Introduction to Modern Information Retrieval, 
McGraw-Hill Book Com-pany, New York, 1883. 

5. Extensible Markup Language (XML) 1.0, W3C Recommendation, 
http://www.w3 .org/TR/REC-xml. 




Understanding Support of Group in Web 
Collaborative Learning, Based on Divergence 
among Different Answering Processes 



Tomoko Kojiri and Toyohide Watanabe 

Department of Information Engineering, 
Graduate School of Engineering, Nagoya University 
Euro-cho, Chikusa-ku, Nagoya 464-8603, JAPAN 
Phone: +81-52-789-2735, Eax: +81-52-789-3808 
E-mail: {kojiri,watanabe}(9wata nabe.nuie.nagoya-u.ac.jp 



Abstract. Recently, web based collaborative learning is one of the 
hottest subject. In collaborative learning, not only deriving answer, but 
also considering various methods is important in order to get the ability 
to deal with different exercises. Therefore, our objective is to grasp the 
process of solving answer and specify the derived answering paths, then 
generate advice to make the learning effectual. 

In this paper, we focus on the mechanism to grasp the extent of discussion 
and detect answering methods that students need to discuss about. Our 
approach is to arrange the answering paths on “discussion-extent pro- 
jection” based on the similarity among them and detect the answering 
paths that include different viewpoint than derived one. In addition, we 
address “divergent tree” which represents whole answering paths with 
respect to the divergent points among them in order to calculate the 
similarity. 



1 Introduction 

The Internet has been spread increasingly, and students are able to have a chance 
of education anywhere and anytime through web space. Today, the research 
subject of CSCL(Computer Supported Collaborative Learning) based on the 
web environment is one of the hottest topics and is roughly divided into two 
groups: one focuses on the interaction among students rather than knowledge 
acquisition, and another takes opposite stance. 

SharlokII[l] grasps the understandings or interests of individual students 
according to their movements and represents them to support the knowledge 
awareness, which helps students notice the knowledge or existence of other stu- 
dents. This kind of researches focus on the interaction problem in the learning 
group but do not on the comprehension and intervention problems of the learning 
progress. 

In the collaborative learning, it is appropriate learning attitude that students 
solve the exercise by themselves and all students understand the learning process. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 242-249, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Understanding Support of Group in Web Collaborative Learning 243 



However, since students are able to participate in the learning anytime in the 
web environment, the learning does not always proceed effectively. Therefore, it 
is necessary to grasp the learning progress of group and conduct the learning 
so as to be effective. To deal with such requirement, our research objective is 
to monitor the discussion in the learning group, grasp the learning situation, 
and generate appropriate advices at the right time, in order for all students to 
acquire the knowledge of the exercise. 

On the other hand, COLAS [2] evaluates the learning progress of each student 
and notifies teaching staff if the students who are in the situation of impasse are 
found out. The system proposed by Nakamura, et al.[3] introduces pseudo stu- 
dents who participate in the discussion if the corresponding students do not join 
with the discussion positively or cannot understand the discussed stage. These 
researches support the collaborative learning by grasping the understanding level 
of each student. In web world, however, uncertain number of students may orga- 
nize the learning group and it is unsuitable to deal with them individually. Thus, 
we support the collaborative learning according to the situation of the learning 
group, but not individual students. 

Based on such viewpoint, we have already addressed the mechanism for de- 
tecting the situation that the learning group does not proceed the learning ef- 
fectively by itself [4,5]. This mechanism is available to cope with the impasse 
and un-collaborative situation in the learning group. Then, in this paper, we 
propose the mechanism to deepen the students’ knowledge toward the exercise 
and support the minority opinions. In the learning activity, it is important to 
know various methods in order to get the ability to deal with different exer- 
cises. Thus, we propose the idea which estimates the similarities of answering 
paths, and then complement the learning process if un-derived answering paths 
which contain different answering methods exist. In this paper, we propose the 
discussion- extent projection to arrange the answering paths based on the simi- 
larity and detect the answering paths on which the system advises. 

In order to calculate the degree of similarity, we represent right answering 
paths of exercise as divergent tree^ which shows whole answering process with 
respect to the divergent points among answering paths. The degree of similarity 
among different answering paths is able to estimate according to their divergent 
points. Namely, if answering paths diverge in the earlier answering stage, we 
may be able to consider that their degree of similarity is small, because they 
have only a few answering stages in common. Based on this divergent tree^ the 
system evaluates the degree of similarity among answering paths and estimates 
the extent of learning by arranging them on the discussion- extent projection. 

2 Viewpoint 

2.1 Grasping Learning Group 

In collaborative learning, the discussion activity among students that intends to 
solve the exercise is important. While students explain their opinions to others 




244 Tomoko Kojiri and Toyohide Watanabe 



or help un-understandable students according to the learning situation, they 
develop the ability to represent their opinions clearly as well as consider the 
exercise deeply. Therefore, we respect the discussion activity in the learning 
group and support effectual collaborative learning from viewpoint of promoting 
the discussion. 

WMCLS[6] is constructed according to such point or view, which proposes 
agent-based collaborative learning environment. In this system, agents that cor- 
respond to individual students participate in the discussion process in order to 
change the current topic to where corresponding students do not understand. Al- 
though this mechanism activates the discussion, the new topic that was changed 
by the agent is not always meaningful for all the students in the learning group. 
Therefore, our system monitors the learning situation of group but, of not indi- 
vidual students, and generates advice which is helpful for all the students. 



2.2 Approach 

The advice that system generates should enhance the basic understanding lev- 
els of the students as well as promote the active discussion among students. 
Currently, we investigate collaborative learning from the following viewpoints. 

1) Progress of solving answer. 

2) Extent of discussion. 

For 1), we have proposed resolution derivation seenario and indieators[4:^b]. Res- 
olution derivation seenario is the representation of the learning process to derive 
the answer, which is constructed according to the derived ratio in whole answer- 
ing process. On this scenario, indieators represent the learning group’s learning 
situation by only 3 pointers that indicate the current discussion stage and the 
highest and lowest understandings of the derived ratio in whole answering pro- 
cess. 

In this presentation, we focus on 2). The extent of discussion is estimated 
by the number of utterances for different answering paths and degree of dif- 
ference among answering paths that have been derived. That is, if variety of 
answering paths are appreciated in the discussion process, the system judges 
whether students considered the exercise deeply and successful discussion was 
made. For grasping the extent, we address diseussion- extent projeetion to ar- 
range individual answering paths after the answer has been derived. The system 
grasps the extent of discussion by the distribution of answering paths on the 
diseussion- extent projeetion. 

The degree of similarity among different answering paths is estimated from 
the divergent point. If the answering paths diverge in earlier stage, it is regarded 
that they are based on different answering methods. Even if different answering 
paths result in the same answering stage, we consider that their conceptual ideas 
are different from each other. In order to grasp the degree of similarity among 
answering paths, we represent all answering paths of exercise as divergent tree. 
Divergent tree indicates the answering process of exercise from a viewpoint of the 




Understanding Support of Group in Web Collaborative Learning 245 



divergence among answering paths. According to the divergent tree^ our system 
arranges derived answering paths on the diseussion- extent projeetion. 

2.3 Collaborative Learning Environment 

In our collaborative learning environment, students proceed the learning by in- 
teraction space and answer-board screen. Interaction space is like a chat en- 
vironment which supports a free discussion. On the other hand, answer-board 
screen is a public communication tool which is set to arrange the derived an- 
swering paths of learning group and announce of students’ own opinions to all 
other students. The system only monitors the inputs on these tools and infers 
the learning group’s answering activity. 

Furthermore, we limit the situation that the final answering path of the 
learning group is always unique. That is, if the different opinions arise at the same 
time, students may choose on of them immediately. Under these circumstances, 
we expect the collaborative learning among high school students who study 
together to solve the exercise that has right answer. In this paper, we focus on 
mathematical exercises: especially computation for the roots of equations. 

3 Divergent Tree 

3.1 Structure of Divergent Tree 

Divergent tree is based on tree structure which indicates the processes of deriving 
answer from divergence points among answering paths. Each node contains a 
series of answering stages that does not include divergence. Edges indicates the 
sequence of nodes, namely, the answering stages corresponded to child nodes are 
derived after that of parent node. Therefore, each path of this tree from root 
node to leaf node corresponds to particular answering path from the beginning 
to the end. 

Nodes in the divergent tree are composed by distinguishable words and ratio 
of proeeeding. Distinguishable words are the meaningful statements that discrim- 
inate the node from other nodes. When one of the distinguishable words is input, 
the system specifies current discussing node in the divergent tree. Ratio of pro- 
eeeding is set for the purpose of estimating the similarity among answering paths. 
The ratio of proeeeding corresponds to the position of node in whole answering 
process so that it takes the value between 0 to 100. If the answer has been de- 
rived, the system grasps the degree of similarity among answering paths, based 
on the values. 



3.2 Example of Divergent Tree 

In mathematical exercises, each answering path is divided into several answering 
stages according to formulas or methods that are used to derive the answer and 
each stage contains particular equations which represent the result of each stage. 




246 



Tomoko Kojiri and Toyohide Watanabe 



Table 1. Example of exercise and answering path 




So, distinguishable words in individual nodes are identified by the equations 
included in corresponding answering stages. 

Table 1 shows the example of exercise and answering path of it. The answering 
path is divided into 5 answering stages according to the formulas or methods and 
underlined equations are meaningful statements in individual answering stages. 
In this paths, the distinguishable words are defined as underlined equations. 




paths 

Fig. 1. Example of constructing divergent tree 



The ratio of the proeeeding is set by the same way as we decide the scores in 
the answer sheets. When the particular statements were written in the answer 
sheet, we decided the scores of them according to the position of the statement 
in whole answering process. That is, if the statement, by which we consider 80% 
of the answering process has derived, is written, we give the score “80” to this 
statement. So, the ratio of proeeeding is decided as the position of the node in 
whole answering process. 

Eigure 1 shows the example of constructing the divergent tree. If several 
answering paths for the exercise Table 1(a) are arranged as Eigure 1(a), based 







Understanding Support of Group in Web Collaborative Learning 247 



on the ratio in whole answering process, the divergent tree of this exercise is 
constructed as Figure 1(b). In the tree structure, circled numbers are the iden- 
tifications of nodes and the equations in each node are the distinguishable words 
to specify the discussing node during discussion. Numbers put in parentheses 
show the ratio of proceeding. 

4 Discussion-Extent Projection 

4.1 Structure of Discussion-Extent Projection 

Discussion-extent projection is a straight line which is graded scale from 0 to 
100, which shows the degree of similarity from final answering path. When the 
answer has been derived, final answering path is valued as 100 and other paths 
are set on particular value according to its degree of similarity based on the ratio 
of proceeding in the divergent tree. 

Once the answer has been derived, the system arranges answering paths that 
are not contained in the final answering path on the discussion-extent projection 
and understand the extent of discussion. When there is an utterance of the 
answering path that has low similarity between final answering path, the system 
judges that the utterance is quite different from final answering path, so it may 
be of great worth to discuss the method of such answering path. 

4.2 Arrangement of Answering Paths 

The system arranges individual answering paths on discussion-extent projection 
according to the ratio of proceeding for the nodes in divergent tree. Since the 
similarity of answering paths is estimated at the position of divergence point, 
the value of individual answering path on discussion-extent projection is set 
according to the ratio of proceeding in the node that includes the divergent 
stage of them. 

Figure 2 shows the example of arrangement of answering paths for the exer- 
cise in Table 1(a). When we assume that students solve the exercise by answering 
paths, other answering paths are arranged on the discussion-extent projection 
as Figure 2. Namely, pathS is put on the value “100” and path4 is placed on the 
value “70”, since the nearest common node between pathS and path4 has the 
value “70” for the ratio in whole answering process. In the same way, path2 and 
paths are set on the value “30” and pathl on “0” . 

path2 

pathl path3 path4 paths 



0 10 20 30 40 50 60 70 80 90 100 



Fig. 2. Example of arrangement on discussion-extent projection 




248 Tomoko Kojiri and Toyohide Watanabe 



5 Experiment 

In this section, we simulate the collaborative learning based on the answers which 
we asked students in our laboratory to solve individually. We asked 13 students 
in our laboratory to give the answer of exercise in Table 1(a). 10 student could 
proceed solving answer. However, 1 student were not able to derive the next 
answering stage of node 1 and 2 students also could not proceed answering 
after nodeS. 6 students are managed to derive the answer by path2, 1 student 
by paths, 2 students by path4, and 1 student by path5. Now we simulate the 
collaboration of learning group which consists of such 13 students who sent the 
answers of exercise. We assume that each student wants to solve the answer by 
the answering path of what he/she sent to us. Since the limitation of our research 
is that final answering path of the learning group is one, so students cannot solve 
the exercise by their own answering path but have to discuss to choose one of 
them. Under this assumption, we expect following case and explain the handling 
methods by means of the discussion-extent projection. 

case: Students solve the exercise by path2 with a hot discussion at 
the nodeb in the divergent tree. 

In this case, the learning situation is represented on the discussion-extent pro- 
jection as shown in Figure 3. We suppose that the learning group derives the 
answer by path2 and the hot discussion occurs at the divergent point of nodeb. 
In such condition, the nodeb will not be a candidate answering stage from the 
system’s advice, because the system judges that the divergent point is well dis- 
cussed already. If the system generates advice under this situation, it explains 
about the answering method of pathl, since pathl uses totally different answer- 
ing methods. 



paths (1 student) 
path4 (2 students) 

pathl (no student) paths (1 student) 



0 10 20 30 40 50 60 70 

t t 

(1 student (2 student 

has retired) has retired) 



path2 



80 90 100 



Fig. 3. Discussion-extent projection corresponds to the case 



6 Conclusion 

We have been studying the idea of supporting collaborative learning in the web 
space. In this paper, we focused on the situation that after the answer has been 
derived, and addressed a mechanism to decide the answering paths or answering 




Understanding Support of Group in Web Collaborative Learning 249 



stages that system generates advice, based on discussion-extent projection. Al- 
though this mechanism is very simple, we consider that the information detected 
from it is enough to find out the stage that the system should generate advice 
about in collaborative learning. 

Our future works are shown below; 1) implementation and evaluation of 
proposed mechanism, 2) mechanism to generate advice, 3) treatment of learning 
group that has several subgroups. 

Acknowledgments 

The authors are very grateful to Prof.T.Fukumura of Chukyo University, and 
Prof.Y.Inagaki and Prof.J.Toriwaki of Nagoya University for their perspective 
remarks, and also wish to thank our research members for their many discussions 
and cooperations. 



References 

1. H. Ogata, K. Imai, K. Matsuura, and Y. Yano: “Knowledge awareness map for 

open-ended and collaborative learning on world wide web.” Proc. of ICCE’99, Vol. 
1, pp. 319-326, (1999). 242 

2. S. Watanabe, T. Nakabayashi, H. Satoh, T. Jiang, and T. Oda: “Web-based edu- 
cational system: Monitoring and assisting learners.” Proc. of ICCE’99, Vol. 1, pp. 
693-700, (1999). 243 

3. M. Nakamura and S. Otsuki: “Group learning environment based on hypothesis 
generation and inference externalization.” Proc. of ICCE’98, Vol. 2, pp. 535-538, 
(1998). 243 

4. T. Kojiri and T. Watanabe: “Adaptable learning environment for supporting a 

group of unspecified participants in web.” Proc. of SITE’99^ pages 1937-1942, 
(1999). 243, 244 

5. T. Kojiri and T. Watanabe: “A management method of leaning situation in collab- 
orative learning.” Proc. of ICCE’99, Vol. 1, pp. 386-393, (1999). 243, 244 

6. G. Liming, H. Minghua, and Q. Yuhui: “A web-based multi-agent collaboration 
learning system.” Proc. of ICCE’98, Vol. 1, pp. 205-210, (1998). 244 




Fuzzy Modeling Approach for Integrated Assessments 
Using Cultural Theory 



Adnan Yazici^*, Fred E. Petry^, and Curt Pendergraft^ 



^Dept. of Computer Engineering, METU, Ankara-Turkey 
^Dept. of Elee. Eng. and Comp. Se., Tulane Univ., New Orleans, LA 70118 
^The Ameriean Outbaek , Colorado Springs, CO 80903 



Abstract: It has already been noted that predieting soeietal responses 
aeeurately requires the use of a formal model sueh as eultural theory. A basie 
belief of eultural theory is that all soeieties and their underlying worldviews, 
irrespeetive of time or plaee, must be more or less hierarehie, more or less 
individualistie, more or less egalitarian, or more or less fatalistie. This approaeh 
has a potential for eross-temporal and spatial eomparisons that makes it a 
partieularly attraetive instrument for a study of the human dimensions of global 
elimate ehange. However, a signifieant diffieulty in the previous attempts for 
utilizing eultural theory in integrated assessment models (lAMs) has been the 
inexaetness or uneertainty inherent in both lAMs and eultural theory. In this 
paper we introduee a fiizzy-based modeling approaeh that makes use of eultural 
theory in integrated assessment approaeh to provide a meehanism for 
understanding the reaetion of a populaee to environmental poliey deeisions. 



1 Introduction 

Integrated assessment models (lAMs) attempt to integrate information used to assess 
the climate change related to global warming by linking mathematical representations 
of different components of natural and social systems in a computer model [1]. That 
is, lAMs have the potential to provide meaningful input to global policies with regard 
to climate change. However, it has been pointed out that lAMs containing socio- 
economic models are lacking in the social component. One other effort [6] has also 
found that Cultural Theory (CT) is effective in lAMs. A significant difficulty in this 
approach has been the inexactness or uncertainty inherent in I AMs and CT. CT is one 
of the theories that involve the classification of people in different parts of the US into 
“cultural groups”. These groups are Individualists, Egalitarians, Hierarchies, and 
Fatalists, according to Cultural (or Grid-Group) Theory (CT) [2]. 

An axiom of CT is that all societies and their underlying worldviews, irrespective 
of time or place, must be more or less hierarchic, individualistic, egalitarian, or 
fatalistic. These four cultural leanings are defined by the two dimensions of “Grid” 
and “Group”, which respectively describe the number and strength of behavioral 
prescriptions and postcriptions imposed by living in a particular way, and the strength 
of people’s attachment to the community that lives that way. CT’s potential for cross- 



• Prof A. Yazici has been eurrently visiting the Department of EECS, Tulane Univ., New 
Orleans, LA 701 18, USA. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 250-260, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




Fuzzy Modeling Approach for Integrated Assessments Using Cultural Theory 25 1 



temporal and spatial comparisons makes it a particularly attractive instrument for a 
study of the human dimensions of global climate change. In fact, it is shown in [3] 
that a person’s cultural classification is a much better predictor of that person’s 
attitude toward environmental questions than, for example, race, gender, education, 
economic situation, etc. 

The problem then becomes: “how does one put this idea into an integrated 
assessment model?” It has already been pointed out that uncertainty plays a key role 
in assessment modeling [6]. We think that one possible solution to this problem is to 
use fuzzy set theory in modeling cultures, environmental policies, and the 
relationships among the policies and cultures. Fuzzy set theory can play an important 
role in integrated assessment modeling by identifying, representing, and modeling the 
uncertainties involved in the problem domain. Our fuzzy-based modeling approach, 
which utilizes the extended cultural theory, can be used in integrated assessment of 
climate change with the identification of these groups and modeling the behavior of 
each. One may go further to predict the responses of the cultural groups of individuals 
to short-term climate fluctuations and to tailor the forecasting message to them by 
using our modeling approach presented here. 

The rest of the paper is organized as follows: Section 2 first summarizes cultural 
theory. In the next section we discuss how CT is utilized in evaluation of policy 
decisions. Section 4 includes our fuzzy-based modeling approach utilizing CT in 
assessing policy decisions. We use the proposed model in a rather simple application, 
evaluation of a policy on deregulation of electric power production, and the results 
and related descriptions are presented in Section 5. Finally, Section 6 provides 
conclusions and future work. 

2 Cultural Theory 

The roots of CT [2] are in the philosophical/sociological work of Tocqueville, 
Niezsche, Toennies, Simmel, Durkheim, and Fleck, all whom recognized the 
divergent societal consequences of worldviews that might adequately be described as 
egalitarian materialism, centralization, and anomic individualism. Other researchers, 
such Mary Douglas and Aaron Wildavsky, built on this foundational categorization, 
developing what is now known as CT. Numerous others have applied CT in analyses 
of perceptions of and attitudes toward environmental issues and policies. Among 
these scholars are Steve Rayner, Michael Thompson, and Richard Ellies. 

Among work, Pendergraft [3] suggests that using CT to classify survey 
respondents into clusters produces a stronger indicator of attitudes toward 
environmental threats than do such commonly used variables as political party, 
political philosophy, age, gender, income, or education. This result comes from a 
random survey of 762 persons in selected counties and parishes in USA. 

What CT adds to the analytic arsenal is an enhanced ability to go beyond mere 
description toward verification and practical utility. A major virtue of CT is its 
universality of application to society: societies and the cultures that inform their 
institutions may be cross-temporally and -spatially compared using this simple but 
powerful theory because any society, any culture must be simultaneously more or less 
egalitarian, hierarchic, individualistic, and apathetic. CT helps us link diverse levels 
of analysis - from the individual level to the global. If indeed people’s perceptions of 




252 Adnan Yazici et al. 



and reactions toward threats are shaped by the socio-cultural lenses through which 
they perceive reality, the very different implications of egalitarianism, hierarchy, 
fatalism and individualism can be illumined by CT. CT also equips us to consider 
adaptation at both individual and collective levels. 

After a decade of research. Cultural Theorists are just now finding which types of 
questions are most effective and efficient in providing data that can be easily 
interpreted. As this discovery process progresses, studies increasingly suggest that 
social networks, which are united on the basis of “tacit knowledge”, are the most 
important single influence on how information about environment hazards is 
perceived [4]. The principle was well-expressed three decades earlier by Michael 
Polanyi: “Our believing is conditioned at its source by our belonging” [5]. 

Even when drastic events overwhelm any rational possibility that their reality can 
be denied, the issue of what to do about the problems may still be shaped by cultural 
predisposition. Egalitarians will blame individualists, hierarchies, and fatalists, 
recommending and supporting redistributive policies to restore the balance they long 
for. Individualists will blame hierarchies for abusing undeserved power, egalitarians 
for squelching actions that could allow the situation to be coped with. Hierarchies will 
blame unruly, selfish individualists and impractical egalitarians. Fatalists will just try 
to endure. 

Cultural theory provides a foundation for predicting what sorts of reactions are 
likely from which groups. Thus, it should also suggest which remedial policies have 
the best chance to achieve a critical mass of support for their successful formulation 
and implementation. 

3 Cultural Theory in Evaluation of Policies 

In general, there are several "policy characteristics" used in policy evaluation. These 
are sometimes summed up as the "3E's and 3P's:" Effectiveness, Efficiency, Equity, 
Participation, Predictability, and Procedure. 

Effectiveness means that the policy does what it was intended to do. It is the 
major concern of Hierarchs. Efficiency means that the benefits of the policy are cost- 
effective. This is a measure that individualists are especially concerned about. Equity 
is of particular concern to egalitarians. 

Participation is an interesting criterion. Egalitarians like the maximum, broadest 
range of public input, while hierarchs would prefer that experts have the most say, 
and individualists are concerned that stakeholders have influence. 

Predictability is important for both hierarchs and individualists, who must plan 
around the administration of policy administrators to accomplish their goals. 
Egalitarians might more often wish that a policy could be administered on a case-by- 
case basis, to take account of "special needs." 

Procedure refers mostly to "procedural fairness:" Egalitarians would be 
particularly sensitive here, and Individualists would usually press for less red tape and 
quicker decisions. Hierarchs want things done in an orderly manner. 

The 3E's are largely susceptible to evaluation by economic measures (e.g., 
cost/benefit analyses) while the 3P's are more fuzzy, more apt to need evaluation by 
intuition (which is of course dependant on the cultural views of the evaluator — is the 
process fair?). 




Fuzzy Modeling Approach for Integrated Assessments Using Cultural Theory 253 



The term "public interest" seems almost to have been taken over by egalitarians. 
If, for instance, you do an Internet search for the term, you will find that most of the 
web sites you find are said to be advocates for "consumers," "the environment" and 
"democracy" — strong indications that the group leans to the egalitarian side. Arguing 
that all policies should be evaluated on the basis of their effects on "the least 
advantaged members of society," is most congenial to the egalitarian worldview. 
"Utilitarianism," on the other hand, argues that a good policy be one which maximizes 
overall social utility — a notion congenial to hierarchs. 

An axiomatic concept among most policy analysts is "Pareto Optimality," which 
says that we should redistribute to the point that any further redistribution would 
make someone less well off However, egalitarians and hierarchs are quick to deny 
this economics-based criterion: for Egalitarians it is perfectly all right to disadvantage 
the "advantaged" to help the "disadvantaged." Hierarchies think that this is all right if 
and when it enhances the general social welfare, but individualists tend to take a 
laissez faire approach, "blaming the victim," egalitarians charge. 



4 Fuzzy-Based Cultural Theory Model 

Our model utilizes Fuzzy Set Theory [7] and CT [2] in the context of integrated 
assessment models. Fuzzy set theory provides a means for the representation of 
imprecision and vagueness. Each fuzzy set. A, is defined in terms of a relevant 
universal set U by a membership function, denoted as ILiA(ti), where u g U. This 
function assigns to each element u of U a number, in the closed interval [0,1], that 
characterizes the degree of membership of u in A. That is, the membership function 
can take all values between 0 (zero) and 1 (one) including the discrete values of 0 
and 1. More formally, membership functions are the functions of the 
form A: U ^ [0,1]. In defining a membership function, the universal set U is always 
assumed to be a classical set. 

With respect to the computational model, the cultural groups can be thought of 
fuzzy sets since most individuals are not completely in one set, but may belong to a 
set with a degree in [0,1]. For example, many liberal politicians are first egalitarians, 
but with a fairly strong bent toward Hierarchies. Conservatives tend to be 
individualists, but also with hierarchical tendencies. In addition, the possible 
uncertainties may be associated with subjective judgments, policies and disagreement 
among the experts and/or policy makers. Here we specifically aim to a model of the 
general characteristics of the cultural classifications and policies and how the cultural 
classes affect the policies on the global climate change by utilizing fuzzy set theory. 

The general approach for modeling policy decisions and cultural groups to be used 
is shown in Figure 1. In this figure, the knowledge-based system includes a number of 
fuzzy if-then rules for capturing knowledge that basically represents the relationships 
among the policies and cultural groups. The relationships that exist in this application 
are usually imprecise and inexact by nature, therefore we use linguistic variables to 
describe the elastic conditions in the “if-part” of fuzzy rules [8]. The fuzzy 
knowledge-based system will have a capability to perform inference under partial 
matching. 




254 Adnan Yazici et al. 



0 



Manual 
Modifications 






Feedback 



Government PolieiesI 
{PI, ...,Pn}, 
Cultural Groups 
(F,I,H,E},& 
Memberships 
Funetions, possibility 
distributions, ete. 



Input 



Fuzzy 
Knowledge-Based |_ 
System( fuzzy 
inference) 



A Set of 
Assessments 






Output 



Manual Evaluation 



Fig. 1: General Arehiteeture of the Fuzzy Model 



More specifically, our modeling approach starts with fuzzy partitioning the input 
space of a population into the four cultural groups where each is modeled as a fuzzy 
set. A fuzzy partitioning allows a smooth transition from one subspace into a 
neighboring one as illustrated in Figure 2; that is, the transition between two sets may 
be gradual. 



grid 




group 



In Figure 2, the y-axis, grid, describes the strength of the behavioral prescriptions 
and postscriptions imposed by living in a particular way. The x-axis, group, describes 
the strength of people’s attachment to the community that lives that way. As can be 
seen from the figure, cultural theory uses a group-grid typology to characterize 
individuals forming four cultural leanings defined by the two dimensions of grid and 
group, which respectively describe the number and strength of behavioral 
prescriptions and postscriptions imposed by living in a particular way, and the 
strength of people’s attachment to the community that lives that way. The four 
classifications are termed individualists (I), egalitarians (E), hierarchs (H) and 
fatalists(F). For example, when a person’s social environment is characterized by 
strong group boundaries and binding prescriptions, they are termed hierarchs. 

As mentioned before, the entire input space is partitioned which results in four 
fuzzy subspaces. The boundaries of these fuzzy subspaces may overlap and the union 
of these subspaces are the entire space. More formally, a fuzzy partition of a space U 
as a collection of fuzzy subspaces. For any element of the space, its membership 
degree in all subspaces always sums to 1 . 







Fuzzy Modeling Approach for Integrated Assessments Using Cultural Theory 255 



Zi ILiGi (Xi) = 1, Vxi G S and Gi g {F,I,H,E} and Ui Gi = S. 

Each Gi is a fuzzy set and can be represented as Gi = SilLiGi (xi)/xi. 

For example, E = {0.1/hi, l/hj,l/hj+i, ,0.5/hi,....,0.1/hn}, where E stands for 

egalitarian and each hi is an individual or a cluster of individuals. This fuzzy set may 
be represented, for example, by a trapezoidal membership function as shown in 
Figure 3. There could be other ways of representing these fuzzy sets. Instead of 
assigning a membership values in [0,1] for each individual or clusters of individuals, 
we could use some fuzzy terms (or sets) such as: 

E = {weak/hj, ..., very-strong/hj,very-strong/h]..., moderate/hi., ..,weak/hj. 

Which technique is better will be determined later after understanding more the 
details of the application domain. The membership functions for the other cultural 
groups can be generated in similar fashion. 




Each policy (or statement) may be agreed or disagreed with by the cultural groups 
with a certain degree of confidence. Using the membership function of Figure 3, an 
example representing the degree of agreement of egalitarians with each policy is 
illustrated in Figure 4. The membership functions for the other cultural groups can be 
defined similarly. It is also necessary to partition the domain of policies and establish 
fuzzy relationships among the policies. A possible fuzzy partitioning of the domain of 
the policies and representation of similarity relationships of policies is given in Figure 
5. The relationships among the policies and cultural groups can depend on the 
application, the form of data, etc., and especially the form of the fuzzy rules 
represented in the fuzzy knowledge -based system which will be used to capture these 
relationships. 




Pi! Policies 



Fig. 4: The Strength of Agreement of the Egalitarian Group with the Polieies 




256 Adnan Yazici et al. 




Fig. 5: A Fuzzy Partition of Policies Used in the System 



So, our modeling approach involves identifying and defining a set of fuzzy rules 
which describe a functional mapping relationship between a set of input variables 
with attached strengths to a set of output variables with resulting strength from 
inference. In this specific application we identify our major input variables as a set of 
Cultural Groups (F,I,H,E) and as a set of policies (Pi, ...,Pn). The output variables are 
a set of assessments (Al,..., An) about the policy. A fuzzy mapping rule imposes 
an elastic constraint on possible associations between input and output variables. The 
constraint is elastic because a fuzzy rule can describe input-output associations that 
are somewhat possible, i.e., the gray area between totally possible and totally 
impossible. 

The degree of possibility of an input-output association imposed by a rule R can 
be expressed as a possibility distribution [9], denoted by Hr. Since a fuzzy relation 
can be thought as a way for describing a possibility distribution, one can use the fuzzy 
relation to represent the possibility distribution imposed by a fuzzy rule. A fuzzy 
mapping rule is represented by the fuzzy relations formed by Cartesian product of the 
variables referred to in the rule’s ^part and 7%e^-part. For example, the mapping rule 
R: ^ X is F, Then y is P 

This rule can be mathematically represented as a fuzzy relation R defined as 
I^R(^>y) = I^Fxp(^>y)- If we use the min operator for the Cartesian product, the fuzzy 
relation R becomes 

jUR(x,y) = min {jUf(x,y),jUp(x,y)}. 

The inference (i.e, interpolative reasoning) of such a fuzzy rule-based model is 
based on the compositional rule of inference. The net effect is a possibility 
distribution over the domain of definition of the output variable. 

Among three types of fuzzy rule-based models for functional approximation, it 
seems that the Mamdani model [10] fits in our application the best. This model is well 
known and presented in the literature in detail. 




Fuzzy Modeling Approach for Integrated Assessments Using Cultural Theory 257 



5 An Application: An Electric Power Deregulation Policy 

The way we envision this application is that one takes a policy proposal: e.g., 
deregulation of electric power production, and then looks at the responses of various 
actors in the debate over the issues surrounding the policy, based on domain specific 
data provided. 

One begins, perhaps, with a classification of actors by whether they are for, 
against, or neutral in regard to the policy proposed, then looks at the arguments they 
make, supporting their case. These arguments can be categorized in terms of CT. For 
instance, actors opposing deregulation of electric power production will say there is 
little or no need for it. A Colorado Public Utilities Commission (COPUC) 
questionnaire asked if the need for deregulation was strong, slight, or none. COPUC 
also asked if the competition engendered by such deregulation might hurt the 
environment. There exists a cross-tabulation of responses to these two questions: is 
there a need for change? And, will the competition engendered by such a change 
threaten the environment? 

We would hypothesize that egalitarians would think there is not a strong need for 
deregulation (because they see the market as unfair, arguing that it is apt to penalize 
the poor and reward the rich). Since egalitarians tend to oppose most policies aimed 
at deregulation of almost anything, we would suppose that those who deny there is a 
need for deregulation might also be worried about threats to the environment. 
Egalitarians also tend to be far more concerned about threats to the environment than 
are individualists. We would expect individualists to favor deregulation and to say it 
won't hurt the environment, and expect egalitarians to take the other position. 

One can take the set of data (responses to the COPUC questionnaire), and assign 
respondents to the cultural categories. Then he/she can give each category a weight 
proportionate to the percentage of respondents who fit into it, and predict the reaction 
to each policy proposal on the basis of its policy goal. The policy proposal may 
promote or threaten efficiency, equity, or order. These policy preferences are 
respectively those of individualism, egalitarianism, and hierarchies. 

So we would have to take a policy proposal (e.g., deregulation), classify it in 
cultural terms (policies aimed at deregulation ought to be more attractive to 
individualism than to egalitarianism or hierarchy). We would then inventory the 
actors who support or oppose, or are neutral about the policy, and assign them to 
boxes, possibly, with a "power" weight corresponding to their number. 

Suppose we find out that of 50 actors 25 seem to lean toward egalitarianism, 20 
toward individualism, and only 5 toward hierarchies. If these were the case, we could 
see that (assuming other power variables are roughly equal) that egalitarians would be 
in a fairly good position to get their way, since they are a plurality and because 
hierarchies tend not to want to deregulate very badly. 

Here, culture, of course, is only one of the inputs. Other inputs could be included 
such as economic interests, status interests, etc. Actors tend to use arguments that 
support their economic/status interests, but these arguments can usually be seen as 
consistent or internally contradictory. 

For this deregulation problem, we divided the society into several groups: 
(1) Producers of power (larger producers, small producers, and public-owned 
producers), (2) Consumers of power (large private consumers, large public 




258 Adnan Yazici et al. 



consumers, and small consumers), (3) Regulators (federal, state, and local), and 
(4) Policy advocates (pro deregulation, anti deregulation, and neutral). 

Based on the given actual or fictitious data, we assign cultural dimensions to each 
group. To do this we could evaluate whether the group is pro, anti, or neutral towards 
deregulation, and examine their rationale for their position. 

Policy advocates only have opinions pro deregulation, anti deregulation and 
neutral about deregulation. So this group’s membership function includes the 
following values (as example inputs): 
policy _advocates={strongly/pro_deregulation, 

weakly /anti jieregulation, weakly/neutral} (1) 

All other groups have opinions about equity, efficiency, effectiveness, liberty, 
fairness and order of the deregulation. So their membership function consists of the 
following values: 

Large _producers - {medium/equity, small/efficiency, medium/effectiveness, 
small/liberty, small/fairness, medium/order} (2) 

Similar input values may be assigned to other social groups. We convert these 
groups including policy advocates to cultural classes such as 
policy advocates = { 0.8/H, 0.2/1, 0.3/E } 
large producers = { 0. 7/H, 0. 4/1, 0. 5/E } 

by matching each group to cultural classes whose member functions consist of 
pro_deregulation, anti deregulation and neutral about the deregulation and equity, 
efficiency, effectiveness, liberty, fairness and order of the deregulation like 
Hierarchy ={ mod _pro/pro deregulation, mod _anti/anti jieregulation, 
mod jieu/neutral, small/equity, small/efficiency, med/effectiveness, 
very/liberty, med/fairness, small/order} (3) 

Individualist - { strong j)ro/pro jieregulation, weak jinti/anti jieregulation, 
weakjieu/neutral, very/equity, very /efficiency, very /effectiveness, 
very/liberty, med/fairness, small/order} (4) 

Egalitarian - { weak j?ro/pro jieregulation, 

strong_anti/anti jieregulation, weakjieu/neutral, small/equity, small/ 
efficiency, small/effectiveness,med/liberty, small/fairness, small/order} (5) 

In detail the matching can be illustrated by the matching of policy advocates to 
cultural classes. 

Policy advocates = {pi/H, p 2 ^i, } (6) 

where pi, p.2? and ILI3 are values in [0,1] to be calculated. We can get qi, |Li2? and ILI3 
from the following formula: 

module (strong j)ro - mod jyro^pr strong j?ro^ jyro^ps) + 
module (weakjinti-modjinti^Pi-weakjinti^lLl 2 ~ strong anti^ ps) + 
module (weakjieu - mod neu pj-weakjieu "^p 2 -"^e^kjieu (7) 

where all these fuzzy words are from the pro deregulation, anti deregulation, neutral 
parameters of (1), (3), (4) and (5), is minimal. 

Using the same method we can get all groups in terms of cultural classes. So now 
all groups are converted to cultural classes like 
policy advocates = { 0.8/H, 0.2/1, 0.3/E } 
large jyroducers - { 0.7/H, 0.4/1, 0.5/E} 

Similarly, the conversions can be done for the other social groups. 



( 8 ) 




Fuzzy Modeling Approach for Integrated Assessments Using Cultural Theory 259 





support 


oppose 


neutral 


E 


weak 


Strong 


weak 


H 


mod 


weak 


strong 


I 


strong 


weak 


weak 



Fig. 6: The Opinions of the Cultural Groups on Electric Deregulation Policy 

Using a mapping technique (i.e., composition) between groups and the table above, 
we can get each group’s opinion of support, oppose and neutral even with a degree if 
necessary. For example, 

policy advocates - { mod/support, weak/oppose, strong/neutral}, and 
large _producers - { strong/support, weak/oppose, weak/neutral} 

Lastly, by taking the union all these opinions together, we can get the final result as 
follows: 

result for support: strong, result for oppose: weak, result for neutral: moderate 
That is, the output is: {strong/support, weak/oppose, moderate/neutral} 



6 Conclusion 

In this paper we have described a computational model for integrated assessment 
based on cultural theory (CT). While utilizing CT) in integrated assessment models 
(lAMs), we also used the fuzzy set theory to deal with the inexactness or uncertainty 
inherent in both CT and lAMs. Our fuzzy-based modeling approach provides a 
mechanism for understanding the reaction of a populace to environmental policy 
decisions. We modeled and implemented an application, namely deregulation of 
electric power production, by using our fuzzy-based computational model described 
here. Using this model for more complex applications is one of our on-going research 
efforts in the short term. 



References 

1. Risbey, J., M. Kandlikar, and A. Patwardhan, “Assessing Integrated Assessments”, 
Climate Change, 34, pp: 369-395, 1996. 

2. Thompson, M., R. Ellis and A. Wildavsky, Cultural Theory, Westview Press, 
Boulder, CO, US, 1990. 

3. Pendergraft, C. “Using Cultural Theory: Environmental Concerns in Texas and 
Louisiana,” in progress. 

4. Malone, E., S. Rayner and M. Thompson, “ Human Choice and Climate Change,” 
National Association of Environmental Professionals, 23rd Annual Conference, 
San Diego, CA, USA, June 1998. 

5. Polyani, M. Personal Knowledge, University of Chicago, 1962, pp: 322. 

6. van Asselt, M. and J. Rotmans, M. den Elzen, H. Hilderink, “Uncertainty in 
Integrated Assessment: A Cultural Perspective-Based Approach”, GLOBO Report 
Series No. 9, RIVM, Bilthoven, The Netherlands, 71 pp. 





260 Adnan Yazici et al. 



7. Zadeh, L., “Fuzzy Sets”, Information and Control, 8, pp: 338-353, 1965. 

8. Pedrycz, W. and F. Gomide, An Introduction to Fuzzy Sets: Analysis and Design, 
MIT Press, Cambridge MA, 1998. 

9. Dubois, D. and H. Prade, Possibility Theory - An Introduction to Computerized 
Processing of Uncertainty, Plenum Press, New York, 1988. 

10. Mamdani, E. and S. Assilian, “An experiment in Linguistic Synthesis with a 
Fuzzy Logic Controller”, Int. Journal of Man-Machine Studies, 7, pp: 1-13, 1975. 




Fuzzy Knowledge-Based System for Performing 
Conflation in Geographical Information Systems 



Harold Foley^ and Fred Petry^ 

^ Xavier University of Louisiana, New Orleans, LA, USA 70125 

hafoley@xula.edu 

^ Tulane University, New Orleans, LA, USA 70118 
petry@eecs . tulane . edu 



Abstract. The major advantage of geographical information systems (GIS) 
is their ability to efficiently manage geographical data. GIS have the 
ability to accomodate various types of geographical data from multiple 
sources.. Thus, a challenging problem facing GIS is the ability to effec- 
tively integrate geographical utilize the various types of geographic data. 
The process by which these different information sources are merged in 
order to yield a more comprehensive dataset is referred to as conflation. 
In this paper, we describe how a fuzzy knowledge-based system can be 
utilized in accomplishing this task. 

Keywords: geographical information systems, conflation, feature match- 
ing, fuzzy knowledge-based system 



1 Introduction 

GIS, in short, can be considered to be database management systems specifically 
designed to manage and manipulate geographical data. A further discussion of 
GIS can be found in [Bur96]. 

As GIS become increasingly more popular, there must exist methodologies 
for efficiently integrating geographical data from multiple sources. This process is 
referred to as cen/?a^zon[Saa88]. Conflation can be defined as process of merging 
multiple geographic data sets for the purpose of developing a more comprehen- 
sive geographic data set (map). Conflation must be able to handle the differences 
in spatial data content, scales, data acquisition methods, management of un- 
certainty, and detecting and removing redundant and ambiguous geographical 
features. It provides a means by which geographical data from multiple sources 
can be integrated, and thus utilized for improved analysis. 

We have all seen maps of the same area, yet each may have a different ob- 
jective or theme. As a result, the data itself is quite diverse. Thus, there is a 
plethora of useful and resourceful information that is dispersed among various 
datasets. Unfortunately, there does not exist a framework where this potentially 
useful data can be integrated. Consider the following problem. 

Suppose urban planners have to decide where a landfill should be designated. 
In making such a decision, there are a number of criterion that have to be 
considered. For instance, the planners must consider the following issues: 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 260—269, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



Fuzzy Knowledge-Based System for Performing Conflation 261 



1. Site should be easily accessible, yet not near a heavily used roadway 

2. Site should not be near a community 

3. Site should not be near a water supply 

4. Entire site area must exist within parish lines 

These criterion, in addition to a number of others, may have to be considered 
in order for a decision to be made. In all likelihood, such diverse information is 
disseminated among various datasets. That is, one dataset may have information 
describing the transportation system which, in turn, indicates roadway utiliza- 
tion. Another may depict political boundaries and the census information for 
various communities. Still another, may map in great detail the various hydro- 
logical details. With all of the datasets, the planners have the daunting task of 
taking into consideration all of the criteria, and making a decision based upon 
this dispersed, although compatible, data. Typically, integrating the datasets 
would require a cartographer or geographer manually “piecing together” all the 
pertinent data into a integrated framework [Rob88]. Such a process is prone to 
error, extremely time-consuming, and labor intensive. A system that is able to 
automate such a task would, therefore, be quite beneficial for several reasons. 

In addition to these benefits, conflation has other ramifications as well. For 
instance if updates (i.e. add, edit, delete, etc.) are made to a dataset, all other 
compatible datasets (i.e. datasets of same geographical area) can be updated 
accordingly. For example, say a building is added to a dataset. All other datasets 
that map such features can be updated as well. 

The examples above demonstrate the needs and advantages of developing a 
conflation methodology. That is, it is better to view, analyze, and utilize data 
in the context of other related data rather than analyzing data as separate and 
unrelated entities [Fol97]. Adhering to the philosophy, whenever we as humans 
want to determine the meaning or find added meaning in something, we typi- 
cally consider it in its proper context. Well, we apply the same principle to the 
research. By determing and utilizing a context for geographical data, we can 
attain added meaning from it. 

Conflation can be characterized by two subprocesses: feature matching and 
deconfliction. Features can be considered to be representations of real world 
entities. Therefore, feature matching is defined as the process of matching cor- 
responding features that are contained in multiple datasets. That is, determing 
what feature contained in one dataset matches the same feature which is con- 
tained in another dataset. The problem with this is that the features, even though 
they are the same, may have different representations, different attributes de- 
scribing them, conflicting values, etc. Thus, various characteristics that describe 
these features have to be considered. Deconfliction, on the other hand, provides 
a means whereby incorrect matches that are made during feature matching can 
be both identified and resolved [Fol97,Rob88]. 

As for our overall approach, we have identified three criteria that cartogra- 
phers use to perform conflation. They are spatial, nonspatial, and topological. 
This requires us to analyze how each criteria is used, and how they are utlized in 
assisting in the process. That is, we devise a methodology as to how a cargotog- 



262 Harold Foley and Fred Petry 



rapher uses these characteristics for determining matching features. Although 
very different with respect to their connotation, we compute a degree of similar- 
ity measure for each characteristic. This entails us to determine an innovative 
strategy for computing these varying degrees of similarity measures. This issue 
represents the crux of the research problem. 

In this paper, we primarily focus on the nonspatial reasoning approach to 
performing feature matching. Furthermore, we describe an approach of how a 
fuzzy knowledege-based system can be employed to compute similarity measures 
with respect to nonspatial attributes. 



2 Nonspatial Reasoning 



In the past most feature matching, as well as other related areas, focused pri- 
marily on the geometrical aspects of geographical data. As a result, very little 
emphasis was put on nonspatial data (also known as thematic or attribute). We, 
on the other hand, feel that nonspatial data is abundant with useful information. 
Therefore, we incorporate a method for utilizing such resourceful information in 
our overall feature matching scheme. 

Non-spatial attributes are considered as describing all characteristics related 
to a feature, but not pertaining to either the geometric or topologic types of 
data. In other words, nonspatial data can be considered to be the descriptive 
information related to a feature. Also, nonspatial data typically contains both 
linguistic and numerical data. Examples of nonspatial information include the 
name of a feature, soil type of a region, population of a city, type of road, etc. 

Unfortunately, reasoning with nonspatial data can be a bit more daunting 
of a task than simply processing geographic data since nonspatial data is often 
non-numerical. With numerical data we have a well-established mathematical 
framework for formulating it. Yet, when an attribute takes linguistic terms as 
values, we do not have such a formal framework. Therefore, we must rely on 
automated reasoning techniques. Another characteristic of some nonspatial data 
is its inherent uncertainty, imprecision, and inexactness. For feature matching, 
it is often the case that feature attributes do not match exactly. Thus, similar to 
what was done for feature matching with respect to spatial attributes, a degree 
of similarity must be produced. Moreover, it is our objective to simulate the 
sometimes unpredictable reasoning behavior of an expert. Hence, we propose 
the use of a fuzzy knowledge-based system. 

In this chapter, we present the issues and problems associated with nonspatial 
reasoning. Also, we fully describe an approach we use for performing this task. In 
addition, we provide a description of a prototype fuzzy knowledge-base system 
and present our results from a practical test case. Finally, we propose a method 
for automatically acquiring the knowledge to be used in the fuzzy knowledge- 
based system. 



Fuzzy Knowledge-Based System for Performing Conflation 263 



2.1 Fuzzy Knowledge-Based System 

In order for a knowledge-base to be successful for GIS, it must perform very well 
under uncertain conditions [Daw95]. In addition, it must have to be practical so 
that knowledge can be more easily acquired from the expert. 

Currently, there are a number of ways in which uncertainty is handled. Some 
uncertainty measures include certainty factors, Dempster-Shafer, and Bayesian 
networks. Such methods are based on probability theory, yet experts do not typ- 
ically think in terms of probability. An alternative to these uncertainty measures 
is fuzzy set theory. 

We consider fuzzy set theory to be very compatible with the feature matching 
domain. For instance, a much utilized aspect of fuzzy set theory is the degree of 
membership in a set. Instead of determining a degree of membership^ we measure 
a degree of similarity [Fol97]. As the name implies, the measure represents a 
degree of proximity with respect to matching. Essentially, we embed various 
aspects of the fuzzy set theory paradigms into our inferencing and knowledge 
representation schemes in order to efficiently reason and make sound conclusions. 
Also, the theoretical framework of fuzzy sets allows uncertainty management 
to be handled informally. Therefore, we have augmented the KBS approach to 
accomodate “fuzziness.” Hence, we have Si fuzzy knowledge-based system (FKBS). 
The objective of the FKBS is to perform inexact reasoning on the nonspatial 
data and return its conclusions. These results will be used as evidence as to 
whether a candidate matching pair do in fact correspond. Similar to a typical 
KBS, the FKBS is partitioned into two major components: the inference engine 
and knowledge base. The knowledge base contains the expert domain knowledge 
for solving the problem. The inference engine, on the other hand, utilizes the 
domain knowledge together with acquired information about the problem so that 
an expert solution can be made. 



Knowledge Representation One of the most important considerations in 
constructing a FKBS, or any other KBS for that matter, is how the knowledge 
is to be structured for both the knowledge base and the inference engine. In 
this research, we consider there to be two types of knowledge that have to be 
represented. This knowledge represents the features contained in the geograph- 
ical datasets. Essentially, this type of knowledge contains the expert domain 
knowledge to be used for problem solving. The second type of knowledge can be 
considered to be the inferential or rule-based knowledge. This knowledge makes 
inferences based upon the given object knowledge. This component is the basis 
of the inference engine. Both serve as the building blocks for the overall FKBS. 

2.2 Fuzzy Rule Base Structure 

The fuzzy rule base represents an encoding of the reasoning process of an ex- 
pert(e.g. cartographer or geographer) familiar with the domain and various data 
merging techniques. The inference engine uses this knowledge to arrive at an 
expert conclusion. The rule base represents the overall problem-solving ability 



264 Harold Foley and Fred Petry 



of the system which is responsible for determing what piece of knowledge to use 
next. It will perform the necessary actions based upon the current execution 
state. 

In constructing the rule base, we use a IF-THEN construct approach which 
is characteristic of most knowledge bases. The choice of using rules was because 
of the simplicity, popularity, and proximity to human reasoning processes. Rules 
are of the format: 

if premise then consequent 

The IF portion of the rule is a premise which tests a truth value. The THEN 
portion of the rule infers a new set of facts or a new state. Thus, the consequent 
executes if and only if the premise evaluates to true. 

2.3 Uncertainty in Rule Base 

Since human reasoning techniques are difficult to model and the representation 
of geographical data is usually inconsistent with respect to both content and 
structure, neither facts and rules will be totally precise. Thus, the FKBS has to 
model this uncertainty nature of the data. 

Although seemingly insignificant, uncertainty can have strong impact on the 
overall solutions made by the system. Some researchers have even argued that 
the inability to reason with uncertainty restrict their performance [Rob88] 

Consider two objects, Feature-1 and Feature-2, in Table 1 and assume 
some attribute values associated with the features do not match. Yet, the fea- 
tures may, in fact, correspond to each other. That is, they are matching features. 
Because there is a slight discrepency in their representations, do we simply elim- 
inate the pair from consideration? This would, in all likelihood, be an unrea- 
sonable and impractical approach to feature matching. Another alternative is 
to determine a degree of similarity that exists between the features. That is to 
say that even though features do not exactly match in entirety, yet if they are 
reasonably similar in structure and content, there is a strong possibility that the 
features do correspond to each other. 

In order to deal with the inherent uncertainty that exists is geographical 
data, we have developed the following framework. We consider each feature to 
be a set of at tribute- value pairs. That is, each feature is represented as a frame 
structure. 

It is not necessary that n = m. The objective of the matching scheme is to 
determing the degree of matching among the corresponding or related attributes. 
This degree of matching is used to determing the overall “matching similarity” 
between given features with respect to the nonspatial attributes. This is not the 
same as the formal similarity relationships in fuzzy set theory. 



2.4 Two Similarity Relationships 

The key to the degree of similarity of the individual values for attributes. There 
are two types of domains for which we must specify the degree of similarity. 



Fuzzy Knowledge-Based System for Performing Conflation 



265 



Table 1. Various geographic data models 



Feature-1 


Feature- 2 


(aii,vii) 


(a21,V2l) 


(ai2,Vl2) 


(tt22, r’22) 










(din, Uin) 


(C'2m 5 r2m ) 



numeric and linguistic. In general, we have formulated membership similarity 
functions for numeric domains and similarity tables for the similarity degree for 
linguistic domain elements. A major difficulty is actually acquiring these for each 
domain in the geographical data description as there may be several hundred 
such domains. 

We have defined matching on nonspatial attributes to be a two-phased pro- 
cess: (1) One that captures the degree of similarity among attribute value pairs 
for a particular attribute and (2) represents the inherent semantic relationships 
among the attributes. For (1), similarity tables are both constructed and uti- 
lized for each attribute where possible. In the cases where a similarity measure 
is not applicable (e.g. a boolean attribute), exact matching is performed. Each 
cell in the similarity matrix contains a value in the range [0, 1] for each attribute 
domain value, whereby each similarity value represents a degree of matching 
between attribute values. 

To illustrate this principle, consider the attribute a/^. Let Dk =< 
v\^V 2 ^ > be the domain of attribute a/^. Then we define the similarity 

function on the domain of Dk as 

S:DkXDk^ [ 0 , 1 ] 

This is denoted by S {vi ^Vj). 

Let us partition Dk into two sets, DSk and DUk- DSk contains the values of 
the domain that are specific values for ak (i.e., attribute values other than other 
or unknown). DUk is the domain for values that are not specific to Ok (i.e. other 
or unknown attribute values). 

So similarity then has the following properties: 

Symmetric, where S{vi^Vj) = S{vj^Vi). 

Reflexive, where S{vi^Vi) = l^Vi G DSk- 

However, note in general for unspecific attribute values 

S(vi,Vj) <l,Vie DUk- 



Also, note this relation is not generally transitive. 
Given S{vi^Vj) = S{vj^Vk) = V 
it is not necessarily the case that 
S(vi^Vk) = V, where V G [0, 1]. 



266 Harold Foley and Fred Petry 



Thus, there are n possible values for a/^. The objective is to represent the 
degree of similarity between every attribute value pair. We represent this by an 
n X n matrix. Therefore, we may get the following. 





Vi 


V2 








Vi 


Sii 










V2 


S21 


S 22 
































'^n 


^nl 


Sn2 






^nn 



Each cell in the above matrix contains a similarity value, Sjk^ where this 
value represents the similarity between domain values Vj and Vk- 

For representing the inherent relationships among attributes, we construct 
rules for representing the cases of particular interest where these situations can 
either positively or adversely affect the degree of matching. For instance, say 
we are comparing the similarity of two airports. Both have the same number of 
runways, but one is used for commercial use and the other is used for military 
purposes. Given this case, there is evidence that these features may not be the 
same because of such a disparity. Also, some experts may even eliminate the 
pair from further consideration since they may weigh the airport usage more 
heavily than the number of runways. So, there are even attributes that carry 
more weight or significance than others. 

We can model the concept of weight associated with an attribute by adding 
another variable, to the at tribute- value pair, < >. As a result, we will 

have the following triple < > where Wi is the weight associated with 

the attribute a^. 

These are the types of reasoning processes that must be identified and repre- 
sented in our knowledge base. We will demonstrate this in the following sections. 



2.5 Representation of Knowledge 

Just as important as the representation of inferential knowledge is the issue of 
how features are to be represented. 

GIS features are described by a number of attributes. One of the most use- 
ful types of knowledge representation structures is a frame structure. A frame 
structure supports the property inheritance philosophy, where elements of spe- 
cific classes inherit attributes and values from more general classes in which they 
are included. 

A frame based system is sometimes considered synonymous with an object 
oriented paradigm; yet unlike the object oriented paradigm, a frame structure 
does not have a behavior associated with them. Although, we refer to the frame 
representation of entities as either objects or structures. 



Fuzzy Knowledge-Based System for Performing Conflation 267 



3 A Fuzzy Matching System 

Since human reasoning techniques are quite vague and the representation of ge- 
ographical information is typically inconsistent with respect to both the content 
and structure, it is usually true that facts and rules are neither totally certain nor 
consistent [Fol97]. Thus, we have developed a technique for effectively capturing 
and structuring this inexact and uncertain information. Since it is rare to have 
a case where corresponding features match identically, it would be unreasonable 
to have a conflation system that simply selects matching features on an identical 
basis in general. Furthermore, it would be just as infeasible to match features 
with very little relevancy. Thus, a system that has the ability to consider various 
degrees and types of evidence is more desirable and practical. In this section, 
we describe the prototype system that was developed for performing feature 
matching. 

In order to exploit the semantic nature of the data, we have developed an 
expert system that can perform heuristic knowledge processing. The expert sys- 
tem was developed in NEXPERT by Neuron Data, Inc. NEXPERT is a expert 
system shell that supports forward, backward, and mixed chaining. It allows for 
the creation of expert system rules and classes & objects. Rules are of the stan- 
dard IF-THEN-ELSE production rule format. NEXPERT’s data organization 
scheme adheres to the standard frame-based paradigm. Since NEXPERT code 
is precompiled to C, it integrates easily with the other system modules. C calls, 
in turn, are used to invoke NEXPERT inference engine. 

In constructing the KB, we partitioned the rule set into distinct rule subsets 
where each subset analyzes a specific domain. Such a decomposition has proven 
beneficial in modularity, understandability, and analysis. 

A sample rule subset would consist of rules relevant to analyzing the matching 
of railways. In the following figure, a snapshot of the NEXPERT development 
shell demonstrates a rule subset that analyzes railroad features, RRl & RR2. 
The codes appearing after the period represent a given attribute describing the 
feature. 

Translating the production rule for rule3(r3), we get the following. 

IF ((RRl.ltn = 3 and RR2.1tn = 2) and 

(RRl.rrc = 16 and RR2.rrc = 16)) 

THEN ^ — 1-0 

Witn ^ 0.5 



Knowledge processing can be invoked at any time during the execution 
through the C calls, thus allowing for dynamic object instantiations. 

For example, the two instantiations. Building- 1 and Building-2, are cre- 
ated during execution. Remaining consistent with the frame-based paradigm, all 
attributes of the superclass. Building, are inherited by the objects. Building- 1 
and Building-2. The attribute values, in turn, are retrieved from some database 
management system. 



268 



Harold Foley and Fred Petry 




Fig. 1. Rule Structure 



Once all preprocessing of the information is finished, knowledge processing 
can begin. The rules are written so as to allow inexact matching and reasoning 
to occur. For all practical purposes, requiring corresponding feature attributes 
to completely match would be too restrictive a constraint. 

The matching cannot simply be based on the similarity tables alone as the 
attributes for each feature may not be independent, but may have semantic 
interrelationships. We have represented these interrelationships as rules in our 
expert system. 

The return values from the expert system are weights associated with the 
attributes. These weights can either strengthen or weaken the matching crite- 
ria. Considering the previous code segment, we can see this demonstrated. The 
weight, witm for is weakened since the attribute values are not equal. Thus, 
Itn has less significance to the overall matching process than rra^ whose respec- 
tive weight is significantly larger. In short, this phase of the matching routine 
attempts to capture the inherent semantic relationships that exist in the non- 
spatial attributes. 

4 Conclusion 

A method for utilizing the nonspatial characteristics of geographical data for 
performing feature matching was presented. Our objective was to identify, rep- 
resent, and reason with the nonspatial attributes for performing feature match- 
ing. Encouraged by our results, we expect to further develop feature matching 
techniques so as to accomodate the spatial and topological characteristics of the 
data. 




Fuzzy Knowledge-Based System for Performing Conflation 269 



References 

Bur96. P. Burroughs and A. Frank, eds, Geograhic Objects with Indeterminate 
Boundaties, GISDATA Series, Vol. 2, Taylor and Francis , London UK, 
1996. 260 

Def93. Defense Mapping Agency (DMA). Military Standard: Vector Product For- 
mats, (1993) Draft Document No. MIL-STD-2407, (1994) Draft Military 
for Digital Nautical Chart, MIL-D-89023, (1994)Draft Military for Urban 
Vector Smart Map, MIL-U-89035, (1994) Draft Military for Vector Smart 
Map, MIL-U-89039,(1995) Draft Military for World Vector Shoreline, MIL- 
W-89012A. Defense Mapping Agency, Fairfax, VA. 

Daw95. T. Dawson and C. Jones, Representing and Analyzing Fuzzy Natural Fea- 
tures in CIS, 405-412, Ninth Annual Symp. on Geographical Information 
Systems, 1995. 263 

Fol97. H. Foley, F. Petry, M. Cobb, and K. Shaw. Using Semantic Constraints for 
Improved Conflation in CIS. 7th Inti Fuzzy Set Assoc World Gonf. Prague. 
261, 263, 267 

Rob88. V. Robinson, Implications of Fuzzy Set Theory for Geographic Databases, 
Gomputers, Env, Urban Systems, 12, 89-98, 1988. 261, 264 

Saa88. Saalfeld, A. (1988). Conflation: Automated Map Compilation. InUl Journal 
of Geographical Information Systems 2(3). 260 



Modeling of, and Reasoning with Recurrent Events 
with Imprecise Durations 



Stanislav Kurkovsky' and Rasiah Loganantharaj^ 

'Department of Computer Science, Columbus State University 
kurkovsky_stan@colstate . edu 
^Automated Reasoning Laboratory, Center for Advaneed Computer Studies 
University of Louisiana at Lafayette 
logan@cacs . usl . edu 

Abstract. In this paper we study how the framework of Petri nets can be 
extended and applied to study recurrent events. We use possibility 
theory to realistically model temporal properties of the recurrent 
processes being modeled by an extended Petri net. Such temporal 
properties include time-stamps stored in tokens and durations of firing 
the transitions. We apply our method to model the recurrent behavior of 
an automated manufacturing cell. 



1. Introduction 

Many events occurring in the real world are of repetitive in nature, that is, an event 
occurs more than once over a span of time. There are numerous attempts to study 
recurrent events in terms of qualitative relations among events and tasks. It is possible 
to have infinite relations among tasks in repetitive events [21]. In this paper, we focus 
on a repetitive event, in which the task sequence satisfies the given quantitative 
constraints of the event. While the sequence of tasks in the repetitive event remains 
the same, the duration of each repetition may differ since each task in each cycle may 
take different time to complete due to availability of different equipment, 
environmental factors, etc. The impreciseness of the task duration must be modeled. 
Further, we need a formal mechanism to model sequence of tasks or to maintain the 
partial order constraints imposed by the event. Once each task is appropriately 
represented with imprecise duration and we have a formal mechanism to maintain the 
task precedence relation in an event, we need to reason with the duration for a given 
specified cycle of the repetitive events. In this paper, we use fuzzy logic to represent 
imprecise duration and extend Petri net framework to maintain precedence 
requirements among the tasks and as well as to reason with distance among 
repetitions of events. 

The paper is organized as follows. We provide the background information on 
Petri nets and representing recurrent events in section 2, which is followed by a 
review on timed Petri nets. In section 4, we describe possibilistic representation of 
time, which is followed by presenting algorithms for timed Petri nets with 
possibilistic time. Section 6 presents the results of experiments on modeling a system 
exhibiting a recurrent behavior. In section 7 we summarize the conclusion of this 
paper. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 272-283, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 273 



2. Background Information of Petri Net Theory 

Ever since the introduction of the Petri net 
theory by Petri in his Ph.D. dissertation, it was 
widely used for modeling dynamic systems in 
the most diverse domains. Petri net is a graph- 
based structure consisting of places and 
transitions. Each transition is connected by, and 
connected to at least one place. When modeling 
Fig. 1. Petri net graph tasks using Petri network, each place that 

connects to a transition corresponds to a pre- 
condition and the transition corresponds to a task. When a token is in a place, then the 
precondition denoted by the place is said to be satisfied. Consider a partial order of 
tasks A < C, B < C (the tasks A and B must be completed before C). When 
transition T A fires (task A completes), it generates a token and it is placed in Pa- 
Similarly, when transition Tb fires (task B completes), it generates a token and it is 
placed in Pb. When there is at least one token in both Pa and Pb, the transition Tc is 
enabled. Thus the following Petri network models the partial order of 
tasks A<C, B<C (Figure 1). 

Modeling Non-deterministic Behavior 

Let us use an automated manufacturing cell shown in Figure 2 to illustrate how to 
model non-deterministic behavior of a dynamic system. The cell is composed of three 
machines, a robot and, input {In) and output {Out) buffers. Parts are coming into the 
cell and are stored in buffer In of unlimited capacity. Parts can be processed by any 
one of the three machines (determined non deterministically), namely. Machi- 
ne 1 (Ml), Machine 2 (M2), or Machine 3 (Ms). Each processed part is removed from 
the buffer of the respective machine and is placed in buffer Out of unlimited capacity. 
Robot transfers parts between the buffers and machines. Robot is non-preemptive and 
at each moment it can be used to either load a part from buffer In to any of the 
machines, or to unload a processed part from any of the machines to buffer Out. 




Fig. 2. An automated manufacturing cell and the corresponding Petri net graph. 



A cyclic event has the following tasks: the robot arm picks a part from the In buffer 
and then transfers it to the buffer of one of the three machines. Mi, M 2 , and Ms. Once 








274 Stanislav Kurkovsky and Rasiah Loganantharaj 



the part is processed by the selected machine, the robot picks the part and leaves it in 
the output buffer. Let us consider a portion of the Petri net that models the robot 
picking a part from the In buffer and leaving it in the buffer of any one of the three 
machines. Once the part is placed in the buffer of a machine, the robot becomes free 
for other activities. The availability of a part in the In buffer is modeled as a token 
placed in pj. Similarly, the availability of the robot is modeled by placing a token in 
place ps. The transitions ti, t 2 and ts respectively represent activity of placing a part in 
the buffer of machines 1, 2 and 3. Initially the three transitions, tj, t 2 and are 
enabled, but only one can be fired at a time. When more than one transition is 
enabled, a transition from the enabled ones is selected either randomly or using some 
heuristic bias. This is how one models non-determini Stic process in Petri network 
model. Once a transition is fired, a token from each incoming place is removed and a 
token is placed to each outgoing place. When t 2 fires, a pair of tokens, one from pi 
and another from ps are removed and a token is placed in each outgoing places {p^ 
diXidps). The problem is somewhat complicated if we impose a constraint that the 
buffer size of machines 1, 2 and 3 is exactly one. 

Modeling Recurrent Events 

We have described the tasks that occur in an instance of the event. It an event repeats 
continually, the robot is free to take parts and place it in an empty buffer of a 
machine. The robot then performs either one of the following activities: (1) removes 
the processed part from the buffer of a machine and then places it in the Out buffer, or 
(2) if there is at least one machine ready for processing a part, the robot removes a 
part from In buffer and place it in the buffer of a machine that is ready. To model this 
continuous repeated activity, the In buffer is initialized with unlimited number of 
tokens. The Petri model shown inside of the contoured area in Figure 2 correctly 
models the repeated event of processing parts by the robot. 

3. Extensions to Petri Nets to Incorporate Time 

Qualitative notion of time is implicitly embedded into the Petri net framework. Only 
one firing of a transition can occur at a time and it is associated with one clock cycle 
or tick of a certain internal clock. To capture the quantitative notion of time, “external 
time” (or “external clock”) was introduced in [2]. In such representation, the 
occurrence of the events depends not only on the marking, but also on the elapsed 
time since the occurrence of some other events. Having a single time line subsumes 
both internal and external times. There are several proposals incorporating the notion 
of time to every component of a Petri net [3, 19]. 

Because of the fact that any Petri net evolves by moving tokens over the places and 
the enabling of the transitions is dependent on the availability of tokens, it seems quite 
natural to associate time with tokens [15, 27, 28]. In most cases, the enabling of a 
transition depends on the timestamps of the tokens. This timestamp may be 
interpreted as the age of a particular token (how much time elapsed since it has been 
produced), or as the age of a sequence of tokens that are generated after a series of 
firing transitions. 

Time can be associated with transitions too [4, 10, 27]. Possible interpretations of 
time 0 is associated with transition /• include: /• may (or must) fire only after 6 time 




Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 275 



units pass after becomes enabled is relative); may fire only during time 
interval ^(^is absolute); the duration of firing of is ^(^is assumed to be relative). 
In this case may (or must) start firing as soon as it becomes enabled. 

Waiting time can be associated with places [8, 28]. Once a token has been added to 
a place, it will not contribute to enabling any transition before the waiting time 
associated with that place has elapsed. Time can also be associated with arcs [18]. In 
this case it is interpreted as a period of time that must elapse until a token will arrive 
from a place to a transition or vice versa. This representation is equivalent to 
representing time as the duration of firing a transition. Associating time with arcs 
and/or places instead of transitions simply changes the way in which a Petri net is 
interpreted. In any case (time associated with transitions, places or arcs) the semantics 
of a Petri net are defined in a similar way. 

4. Possibilistic Representation of Time 

Dubois and Prade [13, 14] use possibility theory to model and manage temporal 
knowledge that involves imprecisely known information. This approach uses points as 
a temporal primitive. Imprecisely known dates are represented as frizzy sets with a 

unimodal possibility distribution over the 
temporal axis. Fuzzy temporal intervals 
are derived from fuzzy dates that are 
limiting the time span during which an 
event occurs. A typical possibilistic 
temporal interval representing a fuzzy 
duration may be approximated by a 
trapezoidal shape. 

Approach proposed by Dubois and 
Prade establishes a very solid foundation 
for applying possibility theory for 
temporal reasoning by providing a 
possibilistic representation of temporal 
primitives. Trapezoidal approximations of 
possibilistic distributions, however, does 
not provide enough flexibility for 
modeling of mathematical functions. 
They are overly restrictive to the resulting 
function and are hardly useful for piece- 
wise approximation. To approximate 
functions and to minimize the number of elementary intervals participating in the 
approximation, it is proposed to represent possibilistic distributions using alternative 
trapezoidal shapes (Figure 3), which have their vertical edges parallel to each other, 
lower edge coincides with the horizontal axis, and the last edge (slope) is arbitrary. 
Such an alternative trapezoidal shape T is determined by four parameters: 
T = {a, b, hi, h 2 }. Depending on the inclination of the slope (i.e. the sign of {hi~h^) 
we will distinguish between L-trapezoidal {hi<h^ and R-trapezoidal (/?i>/? 2 ) shapes. 
Instances of these trapezoidal shapes include rectangular shape {hi=h 2 ), L-triangular 
(h =0) and R-triangular (h2=0) shapes. 



n(x)' 


\ 


7l(x)^ 




















[hi 






hi h, 




- 


— 1 — ^ 


- 


— 







Fig. 3. F-trapezoidal and R-trapezoidal 
possibilistic distributions 




Fig. 4. Approximation of a function using 
F- and R-trapezoidal shapes 




Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 277 



algorithm FireTransit ion ( PN) 
input: Petri network PN 
output: Success or NoTransit ionToFire 

begin 

Token MinToken <— +Infinity 
Transition Firing <— 0 
for all transitions T of PN begin 
token MaxToken <— -1 

for all input places P of transition T begin 
assume P is enabled 

if number of tokens in P < #(P, Input (T) ) then P is not enabled 
else if MaxToken < max (tokens in P) then MaxToken <— max (tokens in P) 

end 

if P is enabled and MinToken > MaxToken and 

MinToken > enabling time of T then 

Firing ^ T 

MinToken <r- max (MaxToken , enabling time of T) 

end if 

end 

if Firing = 0 then return NoTransit ionToFire 
Token NewToken <r- (MinToken + duration of Firing) 
update enabling time of Firing using NewToken 
for all input places P of Firing 
remove minimal token from P 
for all output places P of Firing 
add token NewToken into P 
return Success 

end 



Fig. 8. Algorithm of firing a single transition in a possibilistic timed Petri net. 

A transition may fire only if it is enabled - when each of its input places has at 
least as many tokens in it as arcs from the place to the transition. Figure 8 presents an 
algorithm for firing a transition, which starts with finding all enabled transitions in the 
network. Enabled transitions are found according to the rule given above, which is 
taken directly from the applied Petri network theory. When all enabled transitions are 
found, algorithm must determine which of them will fire. This is a source of non- 
determinism in the classical Petri net theory. In our approach we can use temporal 
information stored in the tokens for determining which enabled transition will fire. In 
each input places of all transitions we must find token MinToken with the earliest 
timestamp. This tells when the first available token has arrived to a given place. The 
timestamps of these tokens will be used for synchronizing enabled transitions. 

For each set of input places of each transition we find a maximum (a synchronized 
token) among previously found tokens MinTokens. This shows the earliest time when 
tokens are available in all input places of an enabled transition and therefore, the time 
when a transition becomes enabled to fire. Thus, the firing transition is determined as 
an enabled transition whose synchronized token is minimal. If there are no enabled 
transitions the algorithm returns a failure. Otherwise, the selected transition fires. 
From all input places of the firing transition a minimal token is removed. Adding the 
duration of the firing transition to the synchronized token generates new token. This 
new token is added to all output places of the firing transition. 

The algorithm for finding a transition and firing it is extended as shown on 
Figure 8 to take into account that tokens are timestamped, transitions have durations, 
and they cannot fire again until a period of time passes that is equal to the duration of 
firing. 




278 Stanislav Kurkovsky and Rasiah Loganantharaj 



6. Experimental Modeling of Recurrent Events with Petri Nets 

The following sections describe different simulation experiments performed on the 
automated manufacturing cell described earlier with different temporal characteristics. 
The simulation is performed as an iterated execution of the algorithm that fires a 
transition in a possibilistic timed Petri net (Figure 8). Simulated time is reflected in the 
timestamps of tokens populating the network and enabling times of the transitions. 
Each experiment is conducted until certain simulated target time is reached (1000 
minutes in all experiments). Simulation continues for a pre-specified significantly 
large number of firings that achieves exceeding the target time. Temporal information 
about firing of each transition is recorded in log files, which are then processed and 
analyzed to build the throughput and utilization graphs of each entity of the Petri net. 

The goal of these experiments is to study how temporal characteristics of different 
entities change the degree of non-determinism and how that affects the behavior of 
the cell. We will analyze the utilization and throughput of each individual entity of the 
automated manufacturing cell and the cell as a unit during the simulation of its 
operation over a significantly long period of the simulation time (1000 minutes). In 
Experiment 1 processing times of each machine are identical, as well as the robot’s 
serving times. In Experiment 2 machine processing times vary insignificantly, while 
the robot’s serving times are identical for each machine. In Experiment 3 machine 
processing times vary significantly and are comparable to the robot’s serving times, 
which are the same as in the previous experiment. Based on the results of each of 
these experiments, graphs of utilization and throughput of each entity of the cell will 
be built. The throughput graphs are expected to grow linearly for the cell as a unit and 
for each individual entity. After some initial period of the simulated time, the 
utilization is expected to tend to certain constant value for each entity of the cell. 

Experiment 1. 

Table 1 presents the numeric data for 
experiment 1 (durations of operations of the 
automated manufacturing cell in the form 
of possibilistic temporal distributions). 
Durations of all robot’s operations are fixed 
at 30 seconds. Durations of operations of all 
three machines are also identical (between 
4 and 6 minutes, but most likely 5 minutes). 
Figure 9 shows that the throughput of all 
machines, robot, and the entire cell linearly 
increases over time, as expected. Also, 
throughput graphs of all three machines coincide due to their identical temporal 
characteristics. Figure lOshows the utilization of the machines and robot. After some 
introductory period (about 300 minutes) the utilization graphs of all three machines 
stabilize at the same value due to the identical temporal characteristics of all 
machines. 



Table 1. Numeric data for Experiment 1. 



Transition 


Possibilistic temporal 
distribution 


t\ 


[30sec(l)l 


ti 


[30sec(l)l 


h 


[30sec(l)l 


t4 


[4min(0), 5min(l), 6min(0)] 


ts 


[4min(0), 5min(l), 6min(0)] 


te 


[4min(0), 5min(l), 6min(0)] 


ti 


r30sec(l)l 


h 


[30sec(l)l 


h 


[30sec(l)l 





Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 279 




Fig. 9. Throughput of the maehines, robot, and the entire eell in Experiment 1 . 




Time, minutes 



—Machine 1 
—Machine 2 
—Machine 3 
—Robot 



Fig. 10. Utilization of the maehines and robot in Experiment 1. 



Table 2. Numerie data for Experiment 2. Experiment 2. 



Table 2. presents the numeric data for 
experiment 2. Durations of robot’s 
operations vary from 20 to 40 seconds. 
Durations of operations of the machines 
vary, but they are significantly larger than 
the durations of operations of the robot. 
Figure 1 1 shows that the throughput of the 
machines, robot, and the entire cell linearly 
increases over the time, as expected. Unlike 
the previous experiment, the throughput 
graphs of the machines do not coincide due to the different temporal characteristics of 
the machines. Figure 12 shows the utilization of the machines and robot. After some 



Transition 


Possibilistic temporal 
distribution 


h 


[20sec(l)l 


ti 


r30sec(l)l 


h 


[40sec(l)l 


t4 


r3min(0), 4min(l), 5min(0)l 


ts 


[4min(0), 5min(l), 6min(0)] 


k 


[5min(0), 6min(l), 7min(0)] 


ti 


[40sec(l)l 


h 


[30sec(l)l 


t9 


[20sec(l)l 








280 Stanislav Kurkovsky and Rasiah Loganantharaj 



introductory period (approximately 300 minutes) the utilization graphs of all three 
machines and the robot stabilize at the some fixed values. 




Time, minutes 



“Machine 1 
“Machine 2 
“Machine 3 
“Robot 
“Cell 



Fig. 11. Throughput of the machines, robot and the entire cell in Experiment 2. 



0.85 
0.8 
0.75 
0.7 
0.65 

I 

!= 0.6 
0.55 
0.5 
0.45 
0.4 

0 100 200 300 400 500 600 700 800 900 1000 

Time, minutes 





^ ^ 


























1 — EZ — E 






















































































> W V 


^ V > 


✓ V > V 


^ W V 




✓ V > \. 


✓ V 


✓ V 


✓ 










C X— 


K X— ? 




(> X — 7 


^ — X — 7 


X 























Fig. 12. Utilization of the machines and robot in Experiment 2. 



Table 3. Numeric data for Experiment 3. 



Transition 


Possibilistic temporal 
distribution 


t\ 


riminCDl 




[IminCDl 


h 


[Imin(l)] 


t4 


[2min(0), 3min(l), 4min(0)] 


ts 


[4min(0), 5min(l), 6min(0)] 


te 


[6min(0), 7min(l), 8min(0)] 


h 


[30sec(l)] 


h 


[30sec(l)l 


t9 


[30sec(l)] 



Experiment 3. 

Table 3 presents the numeric data of 
experiment 3 for durations of operations of 
the automated manufacturing cell in the 
form of possibilistic temporal distributions. 
Durations of the robot’s transferring parts 
to the machines are fixed at 1 minute; 
durations of the robot’s transferring parts 
from the machines are fixed at 30 seconds. 
Durations of the operations of the machines 
differ and they are comparable with the 
durations of the operations of robot. 








Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 281 



Figure 13 shows that the throughput of the machines, robot, and the entire cell 
linearly increases over the time, as expected. Similarly to the previous experiment, the 
throughput graphs of the machines do not coincide due to the different temporal 
characteristics of the machines. Figure 14 shows the utilization of the machines and 
robot. After some introductory period (approximately 300 minutes) the utilization 
graphs of all three machines and the robot stabilize at the some fixed values. 




Fig. 13. Throughput of the machines, robot, and the entire cell in Experiment 3. 




“Machine 1 
“Machine 2 
“Machines 
“Robot 



Time, minutes 



Fig. 14. Utilization of the machines and robot in Experiment 3. 

The results of all three experiments coincide with the expected outcomes: 
utilization of each entity of the cell grows linearly and throughput stabilizes at a 
certain value after some initial time. 



7. Conclusions 

In this paper, we have shown how to model and reason with repetitive events using 
Petri nets. In general, the task sequence of an event may not be preserved in the 







282 Stanislav Kurkovsky and Rasiah Loganantharaj 



repeated occurrence of the same event. Worse yet, the set of tasks occurring in an 
event may not be the same in the repeated occurrence of the same event. Petri 
network seems to be an ideal mechanism to model such complicated repetitive events; 
non-deterministic choices can be modeled elegantly in Petri network. In spite of all 
these advantages, Petri network does not have a well-defined notion of time. There 
are, however, number of ad-hoc works done to incorporate time with Petri network, 
which range from associating time with token to the extend of associating time with 
arcs. Nevertheless, none of these approaches uses time to represent imprecise 
durations of transitions or represents the marking of the network as a set of tokens 
timestamped with imprecise times of their creation. This approach adopted in our 
framework presents a way for a more realistic modeling of the real world problems. 

We use an automated manufacturing cell as a running example throughout the 
paper. The duration of each task, which is represented by a possibilistic distribution, 
is modeled as a transition in the extended Petri network. The tokens represent the 
completion time of the tasks and hence they are associated with possibilistic 
distributions. In a typical Petri network, synchronization or determining of the 
enabling time of a transition is quite easy. We provide an elegant solution to enabling 
a transition with incoming tokens with possibilistic distributions. Using an example 
we have clearly shown the expressiveness of this approach, its advantage over other 
approaches and its applicability to model recurrent systems. Results of the 
experiments conducted with the created model coincide with the expected outcomes: 
throughput is a linear function of time and utilization tends to a certain constant. 
These facts allow us to conclude that our approach is valid for modeling real world 
problem where temporal properties, such as durations of tasks, can be modeled with a 
degree of imprecision. 



References 

[1] J. Allen. Maintaining Knowledge about Temporal Intervals. Communications of the ACM, 
26:832-843, 1983. 

[2] I. Bestuzheva, V. Rudnev. Timed Petri Nets: Classifieation and Comparative Analysis. 
Automation and Remote Control, 51(10):1308-1318, Consultants Bureau, New York, 
1990. 

[3] C. Brown, D. Gurr. Temporal Logie and Categories of Petri Nets. In A. Lingass, R. 
Karlsson, editors, Automata, Languages and Programming, pp. 570-581, Springer- Verlag, 
New York, 1993. 

[4] J. Cardoso, H. Camargo, editors. Fuzziness in Petri Nets, Physiea Verlag, New York, NY, 
1999. 

[5] J. Cardoso, R. Valette, D. Dubois. Fuzzy Petri Nets: An Overview. In G. Rosenberg, 
editor, Proceedings of the 13th IFAC World Congress, pp. 443-448, San Franeiseo CA, 30 
June - 5 July 1996. 

[6] J. Cardoso, R. Valette, D. Dubois. Possibilistie Petri nets. IEEE transactions on Systems, 
Man and Cybernetics, partB: Cybernetics, Oetober 1999, Vol. 29, N 5, p. 573-582 

[7] J. earlier, P. Chretienne. Timed Petri Net Sehedules. In G. Rozenberg, editor, Advances in 
Petri Nets, pp. 642-666, Springer- Verlag, New York, 1988. 

[8] J. Coolahan. N. Roussopoulos. Timing Requirements for Time-driven Systems Using 
Augmented Petri Nnets. IEEE Transactions on Software Engineering, 9(5):603-616, 1983. 




Modeling of, and Reasoning with Reeurrent Events with Impreeise Durations 283 



[91 R. Deehter, L Meiri, J. Pearl. Temporal Constraint Networks. Artificial Intelligence, 
49:61-95, 1991. 

[10] M. Diaz, P. Senae. Time Stream Petri Nets: a Model for Timed Multimedia Information. 
In R. Valette, editor. Application and Theory of Petri Nets-94, pp. 219-238, Springer- 
Verlag, New York, 1994. 

[11] D. Dubois, H. Prade. Possibility Theory. Plenum Press, New York, 1988. 

[12] D. Dubois, H. Prade. Proeessing Fuzzy Temporal Knowledge. IEEE Transactions on 
Systems, Man and Cybernetics, 19(4), July/August 1989. 

[13] D. Dubois, J. Lang, H. Prade. Timed Possibilistie Logie. Fundamenta Informaticae. 
15(3,4):21 1-234, 1991. 

[14] D. Dubois, H. Prade. Proeessing Fuzzy Temporal Knowledge. IEEE Transactions on 
Systems, Man and Cybernetics, 19(4):729-744, 1989. 

[15] M. Felder, A. Morzenti. A Temporal Logie Approaeh to Implementation and Refinement 
of Timed Petri Nets. In D. Gabbay, editor, Proeeedings of international conference on 
Temporal Logic ICTL-94, Bonn, Germany, July 11-14, pp. 365-381, Springer- Verlag, New 
York, 1994. 

[16] P. Fortemps. Jobshop Seheduling with Impreeise Durations: A Fuzzy Approaeh. IEEE 
Transactions on Fuzzy Systems, 5(4):557-569, 1997. 

[17] L. Godo, L. Vila. Possibilistie Temporal Reasoning Based on Fuzzy Temporal 
Constraints. In C. Mellish, editor. Proceedings of IJCAI-95, pp. 1916-1922, Montreal, 
Canada, 20-25 August, Morgan Kaufmann, San Franeiseo, CA, 1995. 

[18] H. Haniseh. Analysis of Plaee/Transition Nets with Timed Ares and Its Applieation to 
Bateh Proeess Control. In M. Marsan, editor. Application and Theory of Petri Nets-93, pp. 
282-299, Springer- Verlag, 1993. 

[19] E. Kindler, T. Vesper. ESTL: A Temporal Logie for Events and States. In J. Desel, M. 
Silva, editors. Application and Theory of Petri Nets-98, pp. 365-384, Springer- Verlag, 
New York, 1998. 

[20] S. Kurkovsky. Possibilistie Temporal Propagation. Ph.D. dissertation. University of 
Southwestern Louisiana, 1999. 

[21] P. Ladkin. Time Representation: A Taxonomy of Interval Relations. Proceedings of fifth 
national conference on Artificial Intelligence, pp. 360-366. Ameriean Assoeiation for 
Artifieial Intelligenee, 1996. 

[22] R. Loganantharaj, S Giambrone. Probabilistie Approaeh for Representing and Reasoning 
with Repetitive Events. In J. Stewman, editor. Proceedings of FLAIRS-95, pp. 26-30, 
Melbourne, FL, 27-29 April 1995. 

[23] R. Loganantharaj, S. Giambrone. Representation of, and Reasoning with, Near-Periodie 
Reeurrent Events. In F. Anger, H. Guesgen, J. van Benthem, editors. Proceedings of 
IJCAI-95 Workshop on Spatial and Temporal Reasoning, pp. 51-56, Montreal, Canada, 
20-25 August 1995, Morgan Kaufmann, San Mateo, CA, 1995. 

[24] P. Merlin, D. Farber. Reeoverability of Communieation Protoeols. IEEE Transactions on 
Communications, 24(9): 54 1-5 80, 1989. 

[25] J. Peterson. Petri Net Theory and The Modeling of Systems. Prentiee Hall, 1981. 

[26] C. Ramamoorthy, G. Ho, Performanee Evaluation of Asynehronous Coneurrent Systems 
Using Petri Nets. IEEE Transactions on software Engineering, 6(5):440-449, 1980. 

[27] M. Tanabe. Timed Petri Nets and Temporal Linear Logie. In P. Azema, G. Balbo, editors. 
Application and Theory of Petri Nets-97, pp. 156-174, Springer- Verlag, New York, 1997. 

[28] M. Woo, N. Qazi, A. Ghafoor. A Synehronization Framework for Communieation of Pre- 
orehestrated Multimedia Information. IEEE Networks, 8(1)52-61, 1994. 

[29] Y. Yao. A Petri Net Model for Temporal Knowledge Representation and Reasoning. IEEE 
Transactions on Systems, Man, and Cybernetics , 24(9):1374-1382, 1994. 




Linguistic Approximation and Semantic Adjustment in 
the Modeling Process 



Eric Fimbel 

Centre de Reeherehe en Neuropsyehologie, Institut Universitaire de Geriatrie de Montreal 

ef imbel@hotmail . com 



Abstract. The transeription of data into a diserete representation system 
(numerieal or qualitative) may be inaeeurate either beeause there exist no exaet 
representation or beeause a eoneise but inexaet deseription is preferred. This 
kind of inaeeuraey is studied within a general framework using Deseription 
Languages, either Qualitative or Numerieal. Two prineiples are stated: 1) the 
writer always seleets a deseription following a eomplexity-aeeuraey tradeoff 
(Linguistie Approximation Prineiple); 2) in a Deseription Language, in every 
speeifie eontext, the Meaning of an expression optimally represents the set of 
values that it ean deseribe (Semantie Adjustment Prineiple). Consequently, the 
Meaning of Qualitative Expressions 1) may ehange in different eontexts; 2) ean 
generally be determined without introdueing arbitrary parameters. The 
eorresponding algorithm is presented in the ease of Linguistie Modeling, to 
ealeulate the fiizzy values assoeiated with Qualitative Expressions. 



1 Introduction 

Inaccuracy is an important concern during the modeling process. As stated by 
Zadeh: "as the complexity of a system increases, our ability to make precise and yet 
significant statements about its behavior diminishes until a threshold is reached 
beyond which precision and significance (or relevance) become almost mutually 
exclusive characteristics (...)" [12]. In concrete terms, inaccuracy comes from: 1) the 
data, undeterministic, or inaccurately measured; 2) the transcription of the data into a 
discrete representation system; 3) the model itself (eg. approximative laws, simplistic 
hypotheses such as the normality of statistical distributions); 4) the derivation of the 
results (eg. iterative calculus with finite precision). 

Linguistic Inaccuracy results from the transcription of the data into a discrete 
representation system. In may be unavoidable (eg. Quantization Errors on analogical 
data [4] [7]) or it may result from writing conventions (eg. 1.2344% is usually 
represented as 1.23%). It has three components: 1) Approximation Inaccuracy, when 
the representation differs from the value; 2) Intrinsic Imprecision, when the 
representation may correspond to different values; 3) Semantic Indetermination, when 
the Meaning of the representation is not fully defined. 



V 

1 1 


^ 


1 


^ 


y "V- 1 

— ^ ^ Al 1 









Fig. 1 Linguistic Inaccuracy. V: value; E: representation; AA: Approximation Inaeeuraey; AI: 
Intrinsie Impreeision; AS: Semantie Impreeision. Total Linguistie Inaeeuraey: AA+AI /2+AS 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 284-289, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 





Linguistic Approximation and Semantic Adjustment in the Modeling Proeess 285 



Linguistic Inaccuracy characterizes an undeterministic correspondence between the 
values and their representations. The Linguistic Modeling approach [12] [8] uses 
fuzzy sets to handle this fact. The data are described by means of Linguistic Variables, 
whose values are Qualitative Expressions (eg."cold", "slightly warm"). The 
expressions correspond to fuzzy sets which quantify an overall imprecision (data and 
Linguistic Inaccuracy, and required precision). As a disadvantage, the definition of the 
fuzzy sets requires many arbitrary parameters or a complex identification process. 

The approach presented here uses a similar framework, but handles the Linguistic 
Inaccuracy and the data inaccuracy differently: 1) the data are fuzzy or exact 
values; 2) they are represented in a Description Language, in which the Meanings of 
the expressions are fuzzy sets; 3) The Domain (i.e. the set of data values) is not 
limited to the Meanings of the expressions; 4) the language is provided with a 
Complexity Function (eg. the number of decimals in a fixed point notation); 5) the 
Meanings are totally ordered (eg. “cold”<“hof ’<‘Very hof ’ ) but they remain unknown 
in the Qualitative Languages contrary to the Quantitative (Numerical) Languages. 

As in any communicative act, the writer of a model is expected to follow some 
conventional rules so as to be correctly understood. The Linguistic Approximation 
Principle describes such a rule: it states that the description of a value is a tradeoff 
between the complexity of the chosen expression and its accuracy. In concrete terms, 
the writer fixes a Complexity Bound', then, the value is described by one of the 
adjacent candidate expressions (with complexity lower than the bound). In a further 
step, the reader acquires the Complexity Bound from the context or the description 
itself Then, the Field of the description (i.e. the possible corresponding values) is the 
set of values for which there are no closer candidate expressions. 

Assuming that the writer follows the conventional rules, the Description Languages 
are expected to be efficient communication tools. The Semantic Adjustment Principle 
gives an efficiency criterion: for any given Complexity Bound, the Meaning of every 
candidate expression optimally represents its Field. The Field of an expression results 
from its Meaning, the Domain and the neighbor candidate expressions. If the Meaning 
is not “centered” in the Field (using a Least Squares criterion), the whole set of 
Meanings may be changed (adjusted) in this specific context. The principle entails 
that, in a continuous and bounded Domain, the candidate expressions must be 
equidistant; in a finite Domain they must correspond exactly to the values. Moreover, 
the usual Numerical Languages are proven to be stable, but in some Qualitative 
Languages the Meanings change with the Complexity Bound. 

An application of these results to Linguistic Modeling is presented as an algorithm 
which calculates the Meaning of the Qualitative Expressions for any Complexity 
Bound. This algorithm allows predictive models to be built, based upon fuzzy 
inference mechanisms, without introducing arbitrary parameters. 



2 Definitions, Principles and Results 

2.1 Fuzzy Sets and Centroids 

A Fuzzy Set is defined by a Membership Function jLi(.) in the continuum [0,1]. Exact 
values and usual sets are specific cases of fuzzy sets. A Centroid is a fuzzy set whose 




286 Eric Fimbel 



membership function decreases symetrically from a maximum. Most fuzzy sets can be 
approximated with centroids (using unions, complementations, intersections and 
Fuzzy Rules IF-THEN [9]).Here, the centroids are considered parametric curves, 
characterized by their center, their precision, and some additional parameters which 
define their shape. The space of centroids, CR is RxRxR^, and a centroid C with a 
membership function p.c(.) corresponds to a triplet (ti(c), X(c), p(c)), where ti(c)g R is 
the Center (i.e. the central value), ^(c)gR^ is the Precision i.e. the distance between 
the Crossing Points (points where the membership function is 0.5), and p(c) g R^ are 
parameters which define the shape, such as the type of curve (eg. triangular). 



1.0 

0.5 




/ 




y 


,r 


1 


u 


\ 




V-) 










\ 


1 '. 







Fig. 2 Left. Different types of eentroids. From left to right: triangular, trapezoidal, gaussian 
{vertical grade of membership; horizontal: real line). Right representation in a eenter- 
preeision plane {vertical: preeision; horizontal: eenter). Eaeh point is a eentroid; dotted lines: 
eorresponding membership fimetions (triangular) 

A Lexicographic Order center-precision is defined on the centroids by: 

vl<v2 iff 7r(vl)<7T(v2) or (7i(vl)=7i(v2) and ^(vl)<>.(v2)) . (1) 

An Equivalence Relationship (and the synonymy of values) is defined by: 

vl ~ v2 iff 7i(vl) = 7t(v2) and X{yI) = X{y2) ; syn(v) = {xg CR | x ~ v} . (2) 

The concept of Adjacence is defined by means of the Lexicographic Order. The 
elements of a set adjacent to a given centroid are defined by: 

VSeCR VvG CR A(S,v) = {xg S | VyG S, (x <y <=v) or (v <= y < x) Z) y=x} . (3) 

It can be proven that if v belongs to S, A(S,v) = syn(v), and that if S is dense 
nowhere on CR, (eg. S is finite) any value of CR has adjacent values in 
S (A(S,v) 7^0). 

2.2 Data and Description Languages 

The data are centroids which represent at the same time the values and the precision 
with which they are known. More complex values (exact or fuzzy) are defined from 
the data by means of formulas, as the solution of equations or as the result of 
calculations or inferences. The data values are represented by means of Description 
Languages. A Description Language E is provided with: 

• a Standard Lnterpretation Function (p(.)'E ^ CR. (p(e) is the Standard Meaning 
of e. E is Quantitative (i.e. Numerical) iff (p(.) is totally known, and Qualitative 
iff cp(.) is totally unknown; 

• an Order between expressions, based on their values. The Order is known in both 
Qualitative and Quantitative languages: 





Linguistic Approximation and Semantic Adjustment in the Modeling Proeess 287 



el<e2 (resp. ~) iff. (p(el)<(p(e2) (resp. ~); syn(e) = {e'eE | (p(e) ~ cp(e')} . (4) 

• a Domain A e CR, i.e. the set of the values that can be represented in E. It must 
contain the values of the expressions ((p(E)eA). It can be continuous, discrete or 
finite, bounded or not. E is said to be closed iff. (p(E)=A, open iff (p(E)eA; 

• a Complexity Function x(-)-E ^N, which provides E with a structure: 

Ec = {e G E I x(e) <= c}; EO e El e .. ; E = Ek . (5) 

Such that for all k, Ek is not empty and (p(Ek) is dense nowhere in CR (eg. finite, 
or there exist a minimal distance between the values). 

For instance, the language {left, center, right} is Numerical when the expressions 
define rotation angles (eg. 135°, 90°,45°), and Qualitative when it is only known that 
"leff'<"center"<"right". It is closed when the Domain has exactly three values (eg. a 
3-state variable), and open when it is continuous (eg. position of a steering wheel). 

Three complementary concepts are necessary so as to describe the way a Description 
Language E is used: 

• the Valid Descriptions at level c dc(.)'CR^P(Ec); this function associates to a 
value the set of expressions of Ec which can describe it at level c; 

• the Field at level c ([)c(.)-Ec^ P(CR). This function associates to an expression 
the set of values that it can describe at level c: 

V e G Ec, (|)c(e) = (vg CR| dc(v) = e } . (6) 

• the Adjusted Interpretations or Interpretations at level c cpc(.):Ec^CR. These 
context-dependant functions may be different from the Standard Interpretation. 

2.3 The Linguistic Approximation Principle 

The Linguistic Approximation Principle states that for any value v of CR, the Valid 
Descriptions of v at any level c are expressions of Ec which value is adjacent to v: 

VvG CR, VcgN, dc(v)e(p-l A(cp(Ec),v), and by default dc(v)=(p-l A(cp(Ec),v) . (7) 

As a consequence, it can be proven that every value of CR has Valid Descriptions at 
any level (dc(v)7^0), but the Valid Descriptions may be unstable (dc(v)cZ: dc+i(v)). 

2.4 The Semantic Adjustment Principle 

The Semantic Adjustment Principle states that at any level c, the Meaning of every 
expression of Ec optimally represents its Field. The representativity is formalized by a 
Least Squares Criterion. Let m3(.,.):P(CR)xCR^ R be defined as follows: 

m^(A,v) =lA(7t(x)-7T(v))2 + (X(x)-X(v))2 dx if A is continuous . (8) 

m3(A,v) =Ea (ti(x)-7i(v)) 2 + (A,(x)-A,(v))^ if A is discrete . 

The Semantic Adjustment Principle states that an Adjusted Interpretation verifies: 

V e G Ec, cpc(e) = arg maxy m2((|)c(e),v) . (9) 




288 Eric Fimbel 



The Standard Interpretation of the language E, (p(.), is stable iff: 

V c G N, V e G Ec, (p(e) = arg maxy m2((|)c(e),v) . (10) 

The Adjusted Interpretations can be characterized in several important cases. In 
particular, if the Domain is continuous and bounded, the centers of the Meanings of 
the expressions of Ec are equidistant, and for expressions with the same center, the 
precisions are equidistant. This result is used in the algorithm presented in section 3. 




Fig. 3 Effects of the semantie adjustment prineiple. Left, eenter adjustement. Top: expressions 
with eomplexity <2; Bottom: eomplexity <3. p(): eharaeteristie funetion of the Domain. 
Expressions: Cold, Mid, Hoi. Eery Cold, Eery Hoi. Right: preeision adjustment. Top: 
expressions with eomplexity <3; Bottom: eomplexity <4. Expressions: Centered, Exaetly 
Centered, i^ather Centered, Eery Exaetly Centered. 



3 An Algorithm for the Interpretation of Linguistic Variables 

The algorithm returns the center and precision of a triangular centroid covering the 
Field of an expression, assuming that a value is described by the closest expression. 
The Domain is [Pl,Ph]X[Ll,Lh]. The Complexity function is the number of words-1. 
Centers and precisions are unknown but can be compared between expressions. There 
are no synonyms. Initially, the array E[] contains the N expressions of length <= C, in 
lexicographic order. Three variables are calculated from E: the matrix M[,] contains 
the indices of expressions of E[] by increasing center(rows) and precision (lines) . 
NL[]: lines used per row of M[,]; NC: used rows in M[,]. 

Centroid (Ex) // returns a centroid on the Field of Ex. 
BuildM 0 

(Ln, Co) = Posit ion (Ex) , (Pc , Wc) = Centroid (Ln, Co) 

Lowb=max (Pi , Pc- (Ph-Pl) /2 (Nc+1) - (Wc+ (Lh-Ll) /2 (Nl (Co) +1) ) ) 
Higb=min (Ph, Pc+ (Ph-Pl) /2 (Nc+1) + (Wc+ (Lh-Ll) /2 (Nl { Co } +1 ) ) ) 
return (Lowb+ (Higb-Lowb) /2 , (Higb-Lowb) /2 ) 

BuildMO // initializes M[,] 

fill M[,] with 0, fill NL[] with 0 , NC = 0 

for Ex=l to N 

for Co=l to Ex if M[l,Co]=0 or 71 (M [ 1 , Co] ) =7T (E [Ex] ) break 
for Ln=l to Ex if M[Ln,Co]=0 break 
M[Ln,Co] = Ex , NC=Co, NL [Co] =Ln 

Position (Ex) // position Ln,Co of Ex in M[,] (not given) . 

Centroid (Ln, Co) // center & precision of M[Ln,Co] 

Return ( (Pl+Ln* (Ph-Pl) / (Nc+1) ) , (Ll+Ln* (Lh-Ll) / (Nl [Co] +1 ) ) ) 





Linguistic Approximation and Semantic Adjustment in the Modeling Proeess 289 



4 Discussion 

The algorithm presented here allows for the use of Linguistic Modeling without its 
drawbacks: since the Meanings of the expressions are automatically calculated in any 
context, less arbitrary parameters are needed. Hence, the optimality of the resulting 
model is improved, following AIC or any similar information criterion [1][10]. 

The Linguistic Approximation Principle is stated in the context of the modeling 
process. Following the rationale of Grice’s conversational rules [3], it postulates that 
the complexity-accuracy tradeoff stated by Zadeh [12] partly comes from the writer’s 
behavior. It could be easily applied to human-machine interfaces, which entail 
activities quite similar to the representation of values. In contrast, its generalization to 
natural language communication (eg. description of qualities, identification of objects) 
may bring several difficulties: 1) the complexity function must handle 

neuropsychological factors (eg. frequency effects, phonological complexity, parallel 
pathways in language processing) [2]; 2) due to interferences and real time limitations, 
the principle may be respected only in a probabilistic way. Stochastic simulations by 
Monte Carlo methods and systematic tests on normal subjects may be a necessary step 
before any generalization can be made [5]. 

The Semantic Adjustment Principle entails both concrete predictions and a 
theoretical question. As the Meaning of the expressions changes with the context (eg. 
“ hot”, when applied to an oven or a swimming pool), a language cannot continue to 
be considered a system of values, in the Saussurian terminology [6] (value=Meaning), 
but rather a set of specific systems, one for each specific situation. Standard Meaning 
should be the minimal information necessary for finding specific Interpretations. 

References 

1. Akaike H., "A New Look at the Statistieal Model Identifieation", IEEE trans. on automatie 
eontrol, AC-19 No.6 pp. 716-723, 1974. 

2. Ellis A. W., Young A. W., "Human Cognitive Neuropsyehology", Lawrenee Erlbaum 
Assoeiates Publishers, 1988. 

3. Griee H. P., "Logie and eonversation", in "Syntax and Semanties - volume 3- Speeeh 
aets", Cole &Morgan editors, Aeademie press, 1975. 

4. Katz P., "Digital Control using mieroproeessors", Prentiee Hall, 1981. 

5. Kirkpatriek S., Gelatt C. D. Veeehi M. P., "Optimization by simulated annealing", Seienee 
vol. 220 pp. 671-680, 1983. 

6. Saussure F., "A Course in General Linguisties", Duekworth, London, 1983. 

7. Slaughter J., "Quantization errors in digital eontrol systems", IEEE trans. on Automatie 
Control, AC-9 pp. 70-74, 1964. 

8. Sugeno M., Yasukawa T., "A Fuzzy-Logie-Based Approaeh to Qualitative Modeling", 
IEEE trans. on fuzzy systems, vol.l No.l, pp. 7-31, 1993. 

9. Yager R. R., "On the Interpretation of Fuzzy IF-THEN Rules", Applied Intelligenee vol.6 
No. 2 pp. 141-151, 1996. 

10. Yen J., Wang L., "Applieation of Statistieal Information Criteria for Optimal Fuzzy Model 
Construetion", IEEE trans. on fiizzy systems vol.6 No. 362-372, 1998. 

11. Zadeh L. A, "Fuzzy sets", inform, eontr., vol. 8 pp. 338-353, 1965. 

12. Zadeh L. A., "Outline of a New Approaeh to the Analysis of Complex Systems and 
Deeision Proeesses", IEEE trans. SMC-3 No. 1, pp. 28-44, 1973. 




A Fuzzy Inference Algorithm for Lithology Analysis in 
Formation Evaluation^ 



Hujun Fansheng Li^ Andrew H. Sung^ William W. Weiss ^ 

^ New Mexico Petroleum Recovery Research Center 
^ Department of Computer Science 
New Mexico Institute of Mining and Technology 
Socorro, New Mexico 87801, USA 
*Correspondence: hu j uni i@baervan . nmt . edu 
{ f s 1 i , sung , we i s s } @nmt . edu 



Abstract. This paper presents a novel application of fuzzy logic in the 
interpretation of well logs, specifically, in determining the formation 
rock types in Petroleum Engineering. To solve this practical problem, a 
new inference algorithm is proposed. The interpretation of well logs is a 
decision-making problem where the issue is to utilize (and compromise) 
knowledge from human experts, evidence from well logs, and 
information from other sources. This research was motivated by the fact 
that fuzzy logic has proven to be highly effective in many applications 
involving uncertainties. Comparing with neural networks, this fuzzy- 
logic-based method avoids the problems of training data collection, 
network training, and unavailability of rules or knowledge used in the 
interpretation. This results in an algorithm that is effective and 
inexpensive. 

1 Introduction 

Interest in using fuzzy logic in the petroleum industry has been rising rapidly [1-7]. 
Petroleum engineering is a comprehensive, multi-disciplinary endeavor. Fuzzy logic 
can find wider uses since, in many tasks, it is inevitable to have to deal with 
incomplete or inexact information; and fuzzy logic provides a framework for building 
systems that can better handle uncertainties and incompleteness in information. 

In this paper, we present an application of fuzzy logic in interpreting well log 
curves. There are several kinds of log curves obtained by using different logging 
methods such as radiation logging or Gamma-ray, electrical and magnetic logging, 
which can be applied in an oilfield during different phases of oil production. Different 
parameters can be determined from the interpretation of different logs. 

Well logs may be interpreted directly by human experts; or software tools can been 
developed to assist in the expert's interpretation. Log interpretation is a complicated 
issue; for illustration purpose, we concentrate our investigation on determining 
formation types from the interpretation of density and Gamma-ray logs in this paper. 



^ Support for this research received from the State of New Mexico and the US Department of 
Energy is gratefully acknowledged. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 290-295, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




A Fuzzy Inference Algorithm for Lithology Analysis in Formation Evaluation 291 



2 Well Logs 

To understand how oil flows and behaves in a reservoir, it is necessary to acquire 
knowledge about its properties, such as rock types. One way to learn the properties is 
to take cores during drilling and conduct laboratory measurements. However, coring 
is very expensive; besides, the rock properties may change when the cores are taken 
up to surface due to changes in pressure and temperature. A log provides an indirect, 
less-expensive way of measuring the properties and is widely used in oilfield [8]. 





Fig. 1. Portions of a density and a Gamma-ray log curves 

A log curve is a graph recording changes of a parameter vs. the depth. A density 
log records formation densities with depths; and a Gamma-ray log reflects strength of 
Gamma-ray radiation of formations with depths. See Fig. 1. One parameter can be 
determined from the interpretation of one or more log curves; likewise, one curve can 
be used to interpret one or more parameters. Therefore, the amount of uncertainties 
during the interpretation of the log curves varies. 

Figure 1 gives a portion of an actual density log curve and a portion of a Gamma- 
ray log curve from an oil well. One of the parameters interpreted from the density log 
curve is rock type, such as dolomite, limestone, sandstone, shale, gypsum and salt etc. 
They can also be determined by interpreting Gamma-ray log curves. 



Table 1. Densities and Gamma-ray strength of pure rocks [9] 



Rock Type 


Density (g/cm^) 


Gamma-ray (relative) 


Dolomite 


2.83-2.87 


2-4 


Limestone 


2.71 


2-4 


Sandstone 


2.65 


4-7 


Shale 


2.3 -2.7 


>7 


Gypsum 


2.33 


2-3 


Salt 


2.08 


1-3 



Typical densities in g/crn and relative strengths of Gamma-ray for some pure 
rocks, which are normalized to the range of [0, 10], are given in Table 1. This is the 
basis to interpret density and Gamma-ray logs. An actual formation in a reservoir, 
however, may contain one or more pure rocks, that is, the formation is actually a 
mixture of various rocks. Therefore, the density value at a certain depth on the density 
log may not be necessarily equal to any density values given in Table 1. Likewise, 



292 Hujun Li et al. 



similar phenomenon happens to the Gamma-ray log. This is one source of uncertainty 
in the interpretation of the logs. 

In addition to the uncertainty that resulted from mixed rocks in a formation, there 
exists another uncertainty in the interpretation of the well logs. It can be seen from 
Table 1 that the density, even for some pure rocks such as dolomite and shale, is not a 
unique value; and that the density of gypsum falls right in the density range of shale. 
The situation is even worse for Gamma-ray: for each rock in the table, its Gamma-ray 
value is not unique but falls in a range; and the ranges of different rocks overlap. It 
appears, therefore, fitting to use fuzzy logic to model human experts' knowledge in 
interpreting the logs to determine the rock types of the formation. 

3 Fuzzy Sets and Rule Base 

Based on the knowledge of experts, we can formulate a rule base for determining rock 
types by combining interpretations of the density and the Gamma-ray logs. Table 2 is 
a rule base by an investigation of the rules used by a human expert and will be used in 
our performance study. In this paper, we consider only the rocks listed in Table 1. In 
actual interpretation of logs, there may well be more rock types in formation. 



Table 2. A rule base for the interpretation of the well logs 



DL 

GR 


Dolomite 


Limestone 


Sandstone 


Shale 


Gypsum 


Salt 


Dolomite 


X 


DL(l.O) 

GR(O.l) 


DL(l.O) 

GR(O.l) 


DL(O.l) 

GR(l.O) 


DL(l.O) 

GR(O.l) 


DL(l.O) 

GR(O.l) 


Limestone 


DL(l.O) 

GR(O.l) 


X 


DL(l.O) 

GR(O.l) 


DL(O.l) 

GR(l.O) 


DL(l.O) 

GR(O.l) 


DL(l.O) 

GR(O.l) 


Sandstone 


DL(0.7) 

GR(0.3) 


DL(0.2) 

GR(0.8) 


X 


DL(O.l) 

GR(0.9) 


DL(0.4) 

GR(0.6) 


DL(0.4) 

GR(0.6) 


Shale 


DL(0.3) 

GR(0.7) 


DL(O.l) 

GR(l.O) 


DL(O.l) 

GR(0.9) 


X 


DL(0.7) 

GR(0.3) 


DL(0.3) 

GR(0.7) 


Gypsum 


DL(l.O) 

GR(O.l) 


DL(l.O) 

GR(O.l) 


DL(O.l) 

GR(l.O) 


DL(O.l) 

GR(l.O) 


X 


DL(l.O) 

GR(O.l) 


Salt 


DL(l.O) 

GR(O.l) 


DL(0.9) 

GR(O.l) 


DL(O.l) 

GR(l.O) 


DL(0.2) 

GR(0.8) 


DL(l.O) 

GR(O.l) 


X 



Note: DL denotes density log and GR denotes Gamma-ray log. 



In Table 2, figures within parentheses next to DL or GR denote the degrees of 
credibility of the corresponding conclusion. For instance, DL(0.7) means that a 
weight of 0.7 is given to the conclusion drawn from the density log; while GR(0.3) 
means that a weight of 0.3 is given to the conclusion reached from the Gamma-ray 
log. In other words, DL(0.7) means that the likelihood that “the actual formation type 
is identical to that interpreted from the density log” is 70%; similarly, GR(0.3) means 
that the likelihood “the formation type is identical to the result interpreted from the 
density log” is 30%. x’s mean that results inferred from the two logs are identical. 

Two groups of fuzzy sets are given for the density and the Gamma-ray logs, based 
on the data in Table 1 and experts’ knowledge. Fuzzy sets for the two logs are shown 
in Fig. 2. The range of density values for each of the six variables is symmetrically 
extended by 0.1 on either side of the range; likewise, the value range of Gamma-ray 
strength (g) for each of the six variables is symmetrically extended by 1 .0. 








A Fuzzy Inference Algorithm for Lithology Analysis in Formation Evaluation • 293 



1 ^ 

0.8 

0.6 - 
0.4 -- 
0.2 
0 - 

1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4 

Density 





^ 1 J 




Q 

^Salt 




-- v/' 




-n- Gypsum 








Dolomite 




il !\/l 1 1 / 




Limestone 


/ 


M IV: I I / 




Sandstone 




1 1 / 




Shale 


- h - 




- \ 


-t 1 

1 1 








1 1 

i-L L 

\l 1 


.Lj 


i — — i— LJ — i— 




\l 1 

\ 1 



0123456789 10 

GR 





Fig. 2. Fuzzy sets for densities and Gamma-ray strengths 



Thus, we have two vectors from the two groups of fuzzy sets for the two log 
curves, respectively. 

D(J) =< Di (J), £>2 (t/), (J), £>4 (J), (J), D^{d)> ( 1 ) 

Gig) =<Gi(g),G2(g),G3(g),G4(g),G5(g),G6(g) > (2) 

Each vector represents a discrete fuzzy set for a rock type. That is, D(d) and G(g) 
represent the following fuzzy sets, respectively: 

Dlid) ^ D2(d) ^ D2id) ^ D^jd) ^ D^jd) ^ D(,jd) 

dolomite limestone sandstone shale gypsum salt 

Gi(g) G2(g) G3(g) G4(g) GsCg) GgCg) 

G(g) = + + + + + (4) 



dolomite limestone sandstone shale gypsum 



salt 



4 Fuzzy Inference Method 

From Table 2, we can extract two matrices, the density possibility matrix and the 
Gamma-ray possibility matrix, given in (5) as below. 



~DL(l,l) - 


DLi\,6) ' 




"gR(1,1) ••• 


GR{\,6) " 


DL{i,j) 




and 


GR{i,j) 




DL(6,1) ••• 


DL(6,6) 




GR(6,1) ••• 


GR(6,6) 



The inference algorithm adopted for this particular problem consists of three steps: 

- First calculate the two vectors D(J) and G(g) using (1) and (2) for density d and 
Gamma-ray g from the given log curves. These two vectors of membership grades 
in different rock types are evaluated from each corresponding group of fuzzy sets. 

- Then, the rules in Table 2 are “fired”, and we achieve two groups of fuzzy sets for 
rock type represented by the following two equations: 

CD(0 = D(d) • DL(/, -) for each i such that G,(g) > 0 (6) 

CGO') = G(g) • GR(-, y) for each j such that D/d) > 0 (7) 



294 Hujun Li et al. 



where DL(/, -) is the t row of matrix DL; GR(-, j) is the j* column of matrix GR; 
and • denotes the element-by-element multiplication of two vectors. The 6-element 
vector CD(/) gives the “degree of confidence” (for each of the 6 rock types) derived 
from the density evidence, under the condition that the Gamma-ray evidence points 
to rock type i (i = 1 for dolomite, i = 2 for limestone, i = 3 for sandstone, etc.) with 
fuzzy membership or degree of validity G.(g). Likewise, the 6 values of vector 
CG(j) gives the degree of confidence of the conclusion derived from the Gamma- 
ray evidence, under the condition that the density evidence points to rock type 7, 
with degree of validity D (d). 

- Finally, summarize all the conclusions CD and CG to give the final conclusion, 

FC, for each rock type, which is calculated using the formula below 

6 6 

FC= ICD(0 © ICG(7) (8) 

/=1 7=1 

The 6-element vector FT gives the degrees of confidence of the conclusion that the 
formation is of each of the 6 rock types. Where, 0 is a fuzzy union operator. 

5 Examples 

5.1 Example 1 

In this case, d = 2.38, g = 6.30, and density points to either shale or gypsum, with d = 
2.38 giving membership grades (degrees of confidence) of 1 and 0.5, respectively, for 
shale and gypsum; while Gamma-ray points to sandstone with membership grade 1. 
D(2.38) = <0.0, 0.0, 0.0, 1.0, 0.5, 0.0>; G(6.30) = <0.0, 0.0, 1.0, 0.0, 0.0, 0.0> 
CD(3) = D(2.38) • DL(3, -) = <0.0, 0.0, 0.0, 0.1, 0.2, 0.0> 

CG(4) = G(6.30) • GR(-,4) = <0.0, 0.0, 0.8, 0.0, 0.0, 0.0> 

CG(5) = G(6.30) • GR(-,5) = <0.0, 0.0, 0.7, 0.0, 0.0, 0.0> 

Here, only one conclusion, CD(3), is obtained based on the density evidence, since 
the simultaneous Gamma-ray evidence has pointed to the unique conclusion of 
sandstone. Two conclusions, CG(4) and CG(5), are obtained from the Gamma-ray 
evidence, since the simultaneous density evidence points to either shale or gypsum. 

To compromise the three conclusions, the standard fuzzy union operator (taking 
maximum) is used to calculate the final conclusion, FC = <0.0, 0.0, 0.8, 0.1, 0.2, 
0.0>. Other union operators may be alternatively used to find FC, or we can take the 
average of all conclusions; but the max operator is arguably the most reasonable to 
use here. 



5.2 Example 2 

In this case, d = 2.770, g = 3.000, then 
D(2.77) = <0.4, 0.4, 0.0, 0.3, 0.0, 0.0>; 
CD(1) = <0.4, 0.4, 0.0, 0.03, 0.0, 0.0>; 
CD(5) = <0.4, 0.4, 0.0, 0.03, 0.0, 0.0>; 
CG(1) = <1.0, 0.1, 0.0, 0.0, 0.1, 0.1>; 
CG(4) = <1.0, 1.0, 0.0, 0.0, 1.0, 0.8> 



G(3.00) = <1.0, 1.0, 0.0, 0.0, 1.0, 1.0>; 
CD(2) = <0.4, 0.4, 0.0, 0.03, 0.0, 0.0>; 
CD(6) = <0.4, 0.36, 0.0, 0.06, 0.0, 0.0>; 
CG(2) = < 0.1, 1.0, 0.0, 0.0, 0.1, 0.1>; 



Hence, the conclusion is FC = <1.0, 1.0, 0.0, 0.06, 1.0, 0.8>. 




A Fuzzy Inference Algorithm for Lithology Analysis in Formation Evaluation 295 



6 Conclusions 

In this paper we presented a method of using fuzzy logic to interpret well logs. Due to 
the problem's unique features, a “rule base” for interpretation is formulated to indicate 
the degree of confidence of a human expert's conclusion based on a given log curve 
(or, equivalently, the degree of credibility of an interpretation based on a specific 
evidence). A fuzzy inference algorithm is proposed to use in conjunction with the rule 
base to obtain an interpretation. Preliminary study of experimental results indicates a 
great potential of using fuzzy logic for well log interpretation. 

It should be noted that our method is proposed to interpret formation properties 
from two log curves. More generally, the method is meant for decision making based 
on two information sources. To perform decision-making based on three sources, the 
rule base will be a 3-D structure, and it may become much more difficult to obtain the 
rule base. However, it is very rarely the case that more than two log curves are used 
to interpret a single property in the real-life petroleum application. We can handle the 
problem by reducing it to a series of decision making, each based on two sources of 
information; this is currently under investigation [11]. 



References 

1. Z.G. Lian, et al. Integration of Fuzzy Methods into Geostatistics for 
Petrophysical Property Distribution, SPE 49964, SPE Asia Pacific Oil and Gas 
Conference and Exhibition, 1998. 

2. C.H. Wu, et al. Statistical and Fuzzy Infill Drilling Recovery Models for 
Carbonate Reservoirs, SPE 37728, Middle East Oil Conference & Exhibition, 
Manama, 1997. 

3. T.H. Chung, et al. Application of Fuzzy Expert Systems for FOR Project Risk 
Analysis, SPE 30741, SPE Annual Technical Conference & Exhibition, 1995. 

4. V.P. Rivera, Fuzzy Logic Controls Pressure In Fracturing Fluid Characterization 
Facility, SPE 28239, SPE Petroleum Computer Conference , 1994. 

5. H.J. Xiong, et al. An Investigation Into the Application of Fuzzy Logic to Well 
Stimulation Treatment Design, SPE 27672, SPE Computer Applications, 1995. 

6. C.D. Zhou, et al. Determining Reservoir Properties in Reservoir Studies Using a 
Fuzzy Neural Network, SPE 26430, the 68th Annual Technical Conference and 
Exhibition of the Society of Petroleum Engineers, 1993. 

7. H.C. Chen, et al. Novel Approaches to the Determination of Archie Parameters 
II: Fuzzy Regression Analysis, SPE 26288, SPE Advanced Technology Series, 
1996. 

8. David E. Johnson and Kathryne E. Pile, Well Log Eor the Nontechnical Person 
(Tulsa, Oklahoma: Penn Well Publishing Company, 1988). 

9. Ed L. Bigelow, Introduction to Wireline Log Analysis (Houston: WESTERN 
Atlas International, Inc., 1995). 

10. G.J. Klir, B. Yuan, Fuzzy Sets and Fuzzy Logic: Theory and Applications 
(Prentice Hall, 1995). 

11. H.J. Li, et al. Fuzzy Interpretation of Log Curves, Technical Report, Computer 
Science Department, New Mexico Tech, 1999. 




Approximating the 0-1 Multiple Knapsack Problem with 
Agent Decomposition and Market Negotiation 

Brent A. Smolinski 

Lawrence Livermore National Laboratory, CA, USA 



Abstract. The 0-1 multiple knapsack problem appears in many 
domains from financial portfolio management to cargo ship stowing. 
Algorithms for solving it range from approximate, with no lower 
bounds on performance, to exact, which suffer from worst case 
exponential time and space complexities. This paper introduces a 
market model based on agent decomposition and market auctions for 
approximating the 0-1 multiple knapsack problem, and an algorithm 
that implements the model (M(x)). M(x) traverses the solution space, 
much like simulated annealing, overcoming an inherent problem of 
many greedy algorithms. The use of agents ensures infeasible solutions 
are not considered while traversing the solution space and traversal of 
the solution space is both random and directed. M(x) is compared to a 
bound and bound algorithm and a simple greedy algorithm with a 
random shuffle. The results suggest M(x) is a good algorithm for 
approximating the 0- 1 Multiple Knapsack problem. 



1 Introduction 

The 0-1 multiple knapsack problem is a generalization of the 0-1 knapsack problem 
arising when m knapsacks, of given capacities c (i=l, ... , m) are available with n 
items, where each item has value v. and size s. > 0 (j=l, ... , n). The objective is: 

m n 

maximize EE 

/=1 7=1 
n 

subject to ^ sx.. < c. i E M = { 1, ..., m} 

7=1 

m 

X X,. < 1 j e N={1, ...,n) x,.= 0orl 

i=l 

where x^^ = 1 if item j is assigned to knapsack i; 0 otherwise. The 0-1 multiple 
knapsack problem is NP-hard in the strong sense [15]. Multiple knapsack problems 
appear in such domains as financial portfolio management and naval ship stowing 
[18]. Typical problems in these domains have thousands of items and dozens of 
knapsacks. 



Work performed under the auspices of the University of California, Lawrence Livermore National 
Laboratory under Contract W-7405-Eng-48. This work has been funded by LDRD UCRL-JC135996. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 296-306, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Approximating the 0-1 Multiple Knapsack Problem 297 



This research develops a model for approximating the 0-1 multiple knapsack 
problem based on agent decomposition and market negotiation. A simple algorithm 
that implements this model, M(x), is compared to a bound and bound algorithm (BB) 
and a greedy algorithm with a random shuffle (G(x)). This paper is organized as 
follows. Section 2 motivates the approach chosen in this research. Section 3 
discusses the use of markets to negotiate for resources. Section 4 describes a market- 
oriented model and algorithm for approximating the 0-1 multiple knapsack problem 
based on agent decomposition and auction negotiation mechanisms. Section 5 
describes the results obtained from running M(x), BB, and G(x) on a suite of test data. 
Conclusions and future work are given in Section 6. 



2 Motivation 

The simplest approach used to solve the 0-1 multiple knapsack problem is a greedy 
first fit: an item is placed into the first knapsack that can hold it. This algorithm runs 
relatively fast, however, has no lower bound on performance. To find better solutions 
to this problem, other approaches are used. Two deterministic approaches used to 
solve the 0-1 multiple knapsack problem are branch and bound [14] and bound and 
bound [15]. Both of these algorithms are guaranteed to find an optimal solution. 
However, they both have time complexities with upper bounds that are 0(rn). Also, 
assuming a surrogate relaxation technique is used to calculate the upper bound at the 
decision nodes, they both have space complexities that are 0(nC), where C is the size 
of the knapsack solved with the surrogate relaxation problem. 

What is desired is a method that will produce solutions that are better than those 
found by simple greedy algorithms, yet with time and space complexities which are 
not exponential in the size of the problem. Choices include a hill climbing algorithm 
and a stochastic search algorithm. Unfortunately, multiple knapsack problems tend to 
have jagged, discontinuous search surfaces with isolated global optimums (caused 
from the co-locality of feasible and infeasible configurations). Hill climbing 
algorithms work poorly on search surfaces with large plateaus and sharp changes. 
Stochastic methods, such as genetic algorithms, tend to have a very difficult time 
finding optimal solutions on sparse surfaces where the global optimum is isolated 
[10,16]. 

One approach to solving complex problems is to divide a single problem into 
several, smaller problems, then solve each sub-problem independently. The result is 
obtained by combining all of the individual solutions. Using this approach each 
knapsack could be considered individually to find the optimal solution for each, thus 
reducing the computational complexity and memory requirements. However, 
because individual solutions may be inter-dependent, this will not always provide a 
valid, much less an optimal, answer. For a “divide-and-conquer” method to be 
successful, the sub-problems must be naturally disjoint. The sub-problems that arise 
with this method of problem decomposition are not disjoint. It is believed that the 
complexity of problems in the classes NP-Hard are due to the inability to naturally 
decompose the problems further. 




298 Brent A. Smolinski 



To satisfy inter-dependent constraints, the sub-problems must be solved together. 
Objects in the problem domain can be modeled as autonomous agents, or object- 
agents [l,7,18]where the interests (or goals) of the agents are consistent with the 
goals and constraints of the whole problem. For the 0-1 multiple knapsack problem, 
each knapsack will have an agent assigned with interests in maximizing the value of 
the items selected, subject to the constraint that they fit in their knapsack. However, 
similar to the divide and conquer method, finding a feasible solution for each agent 
does not guarantee a feasible solution to the whole problem. It is possible that more 
than one agent will want the same item(s) in its solution set. 

When conflicts arise, agents must negotiate. While much research has been 
conducted on conflict resolution [5,6,20,30], many of these approaches rely upon 
negotiation through the exchange of partial plans, which may generally be very 
difficult to develop for this problem. Chapman [2] showed that general planning is 
undecidable even with finite initial states. Remembering that the reason for 
decomposing the problem was to produce a method for solving the problem that is 
fast and effective, this negotiation method may be ineffective because it is not 
guaranteed to be fast (or even to halt). For a method of a agent decomposition to be 
effective, a fast, possibly approximate, negotiation mechanism is needed. 

One promising approach is the use of a market heuristic for negotiation. The nice 
thing about markets is that they can achieve their effects through simple interactions. 
They are able to solve complex resource allocation problems with very little 
information (prices), which makes them a desirable approach for resolving conflicts 
that arise when allocating resources [4]. 

Markets may not always provide the best means for allocation. For example, the 
existence of firms suggests that centrally planned problem solving paradigms may 
work better on smaller resource allocation problems. Intuitively, this makes sense. 
Even though a problem may in theory be computationally complex (i.e. NP-Hard), 
small instances of the problem may be solved in a reasonable amount of time. This 
paper demonstrates that the market paradigm works well for multiple knapsack 
problems whose size is sufficiently large. 



3 Market Applications 

Market-oriented algorithms have been applied successfully towards solving a large 
class of complex resource allocation problems [4,11,12,24,28,29]. Most of the 
current research has focused on developing one of three types of economic models. 
One is an exchange based economy [21], another is a price based economy 
[17,24,27,28]. Neither model fits the 0-1 multiple knapsack problem, where an 
approach using auctions is needed [22,23]. 

In a price based economy producers and consumers participate in a market for 
goods and resources. Producers buy resources and transform them into commodities 
and sell them to consumers. Producers may also be consumers. Goods and resources 
are allocated through a market place in which equilibrium is determined by pricing 
mechanisms. Algorithms of this genre are typically on-line and solve problems of 




Approximating the 0-1 Multiple Knapsack Problem 299 



scheduling and resource allocation over time, where the elements of the problem are 
continuously changing and naturally distributed. These algorithms are not 
appropriate for solving the 0-1 multiple knapsack problem since all items in the 
problem domain are known and constant. Thus, the modeling of producers does not 
make sense. 

In an exchange based approach each agent is endowed with limited resources. 
The agents exchange these resources until their marginal rates of substitution are 
equal. They only exchange resources in the direction of increasing overall utility. In 
other words, they exchange resources such that at least one agent is made better off 
and no agent is made worse off. A Pareto optimal allocation is achieved when no 
exchange can take place without making some agent worse off. The problem with 
this approach is that it must converge to an equilibrium point, which may be far from 
a globally optimal solution. The market is unable to move from this equilibrium point 
because exchanges are constrained to occur in the direction of increasing utility. The 
rate of convergence degrades catastrophically as the number of items increases 
because in the worst case, all combinations of exchanges will have to be considered 
before equilibrium is reached. 

The auction market is similar to a price based economy but without producers. 
Agents bid for the items offered in a central market. Prices are determined through 
tatonnement or non-tatonnement processes. In tatonnement processes [3,26], each 
agent is endowed with specific initial wealth. Initial prices for resources are 
arbitrarily set. An agent computes its demand for a resource based on its utility 
function and budget constraints. It sends its demand to a central auctioneer, which in 
turn computes the aggregate demand for the resources. If excess demand is positive, 
the price of the resource is incremented. If excess demand is negative, the price is 
lowered. Once the price is adjusted, excess demand is computed again. This process 
continues until supply exactly equals demand (equilibrium is reached). Determining 
an equilibrium price can be computationally expensive because an unknown number 
of price postings and aggregate demand calculations must take place before 
equilibrium is reached. In non-tatonnement processes, agents are allowed to trade 
before the economy (or market) has reached equilibrium. Such a process is fast 
because trading begins before equilibrium is reached. A possible disadvantage is that 
intermediate trading never decreases an agent’s utility. As in an exchange based 
economy, this can result in a Pareto equilibrium allocation that is not a global 
optimum. 

Auction strategies include single sided call auction, sealed bid auction, continuous 
auction, and double auction[8,23]. The method described in this paper is a 
combination of a sealed bid and continuous auction. Algorithms using sealed bid 
auctions to solve the 0-1 multiple knapsack are simple and fast because a non- 
tatonnement process of calculating a clearing price is used. Highest bids are the 
selling prices. A variation of a continuous auction approach is integrated into the 
method. Agents may trade and sell items back to the auctioneer in hope of getting a 
better allocation of resources. By relaxing the constraint that every trade results in no 
agent being made worse off, the algorithm may traverse the entire solution space. 




300 Brent A. Smolinski 



4 Developing a Market Oriented Algorithm 

M(x) implements a model, based on agent decomposition and market negotiation. 
That model includes consumers, an auctioneer, items, and protocols for trade amongst 
the agents and the auctioneer. M(x) implements the auctioneer’s and consumers’ 
behavior as well as protocols for the trading. This research describes one bidding 
strategy and one market mechanism, though other combinations of bidding strategies 
and market mechanisms can be implemented using the market model developed in 
this research. 



Market Model. The model is shown in Figure 1. It describes an auction and 
consumers. The auction consists of an agent (the auctioneer) and the items to be 
auctioned. Each consumer represents a knapsack. Interactions between agents are 
one of three types: 1) consumers purchase items from the auctioneer (Purchase 
protocol), 2) consumers sell items to the auctioneer (Sell protocol), 3) consumers 
swap items with each other (Exchange protocol). The Purchase protocol defines how 
items will be allocated through the auction. The Sell and Exchange protocols allow 
algorithms that implement the model to traverse the solution space. Each algorithm is 
free to define the particular mechanics of agent behavior and exchange. 



Purchase Protocol. This section describes the purchase protocol in M(x). Items are 
considered one at a time. This makes for a simple calculation of the clearing price: 
the highest bid. It is assumed that a consumer will prefer items with larger 
preciousness and smaller size. Items that do not fit into the knapsack will not be bid 
upon. This way, the algorithm will not consider infeasible solutions. The utility 
consumer i will receive from having item j (U.^ is: 



where: 






0 

f 

PJ 

V 



at + Sj 

a 



\ 

+ 1 



if bi + Sj > a 
if bi + Sj < a 



pj=V^j 

n 

b. = y sx. 

I / J J IJ 

7=1 

a. = c. - b. 



( 1 ) 

( 2 ) 

( 3 ) 

( 4 ) 



Assuming that consumer i has an initial wealth, or endowment of resources and 5.., 
the bid consumer i makes for item j, equals then the total amount consumer i 
spends can be bound by: 






B.. < 7]. + p. 



77.(77, -l) 



7=1 



V 



277. 



J 



( 5 ) 




Approximating the 0-1 Multiple Knapsack Problem 301 



where: 


r. = rji + pi for / = 1 . . . m 

1 2;/, J 


(6) 




p - max(pj ( min( \_dl sj\ , 1)) for j = (1... n) 


(7) 




n 

7]i - ^ (min{\cil sj\, 1)) 


(8) 



7=1 




Fig. 1. Market Model. 

This means every consumer has enough resources to bid for all the items that form 
the optimal solution and their bidding strategy is consistent with their initial 
endowment of resources r . This bidding strategy is simple, does not require agents to 
build internal models of the state of the system, and makes bid calculations fast. 

The auctioneer simply tries to maximize its profit P\ 

m n 
i=\ 7=1 

This is achieved by selling the most precious items first. The auctioneer sorts the 
items in decreasing preciousness prior to bidding, allowing convergence to a good 
solution quicker when the size of the problem is large. 

With the agents’ behavior defined, a purchasing mechanism needs to be 
developed. The Purchase protocol is based on a sealed bid auction. The clearing 
price for an item is determined by the highest bidder, where items enter a market one 
at a time. If there are no bids greater than zero, the item is not sold. This process 
continues until all items have entered the market exactly once. 



302 Brent A. Smolinski 



Sell Protocol. Using only the purchase protocol can find good results with some data 
sets, however, it is more likely to find a local minimum. It would be desirable for the 
algorithm to traverse the solution space. To do this, a sell protocol is introduced 
where after every round of bidding, each consumer is allowed to sell back some of its 
items to the market (POST procedure). After the items are sold back, the agents enter 
into another round of bids, and so on for a predetermined number of rounds. It is 
assumed, for now, consumers will choose to sell back their least precious items first. 
Also, the number of items sold back to the market in M(x) decreases with subsequent 
rounds. This is similar in principle to a simulated annealing algorithm where 
configurations stabilize over time [13,25]. The number of items sold back in the 
early rounds should be around %50-100 of the items, and around 1 in the last rounds. 
With larger ratios of n / m, this should be close to 100% in the early rounds, which 
puts all items into the solution space quickly. With smaller ratios of n / m, it should 
be close to 50%, which mixes the ordering of the items quickly. It is important to 
give all items (and all combination of items) an opportunity to enter the solution 
space. 

Because the goal was to develop a fast algorithm, the price an item is sold back to 
the auctioneer is not computed. Assuming that the auctioneer buys an item back for 
an amount equal to what it was previously sold and the auctioneer will always buy 
back items posted by consumers, the algorithm does not have to compute the sell 
back price nor keep track of individual budgets. If bid and sell back prices varied, 
consumers would have to alter bidding strategies as their budgets fluctuate. This 
would introduce an added layer of computational complexity. 

Random behavior is not desirable under unbounded rationality. However, an 
agent’s knowledge is bounded since unbounded rationality would dictate unbounded 
computational resources. With bounded rationality an algorithm can get stuck in a 
neighborhood of the solution space. Introducing random behavior allows the 
algorithm to move from such a neighborhood, which results in occasional random 
walks across the solution space. In M(x), every fixed number of rounds in the POST 
procedure (call this sell_random) a computed number of random items are sold back 
by every consumer. A value of sell_random that worked best was about twice the 
logarithm of the number of rounds with an upper bound around 30 and a lower bound 
around 9. 

Exchange Protocol. Not all transactions in an economy are done through the market 
place. These transactions do not require pricing mechanisms, but simply take place 
through direct trade. Without these trades, convergence to an optimal solution can 
take very long. For instance, much time and effort is saved by borrowing sugar from 
a neighbor vs. making a special trip to the store to buy it. In the POST procedure of 
M(x), every exchange rounds consumers swap some number of random items with 
their nearest neighbor, defined by the neighbors directly before and directly after 
them in the list of agents. The value of exchange that worked best was: 
exchange ^ sell_random and 1 < exchange < 4. Having a small value of exchange 
gives the opportunity for all items to be placed in all knapsacks. 




Approximating the 0-1 Multiple Knapsack Problem 303 



Complexity of M(x). The time complexity of M(x) is 0{numrounds xnxm) (or 
0{n log n) for sort) and the space complexity is 0{n xm) . For a complete listing and 
specification, email smolinskil@llnl .gov. 



5 Experimental Results 



This section analyzes the results from running M(x) and two other algorithms - a 
bound and bound algorithm, BB, and a simple first fit greedy algorithm with a 
shuffle, G(x) - over a complete test suite of data. The test data generation algorithm 

350 
300 
250 

%time 200 

150 
100 
50 
0 



Fig 2. Average 

AND M(X) VS. : 

Decreasing Err( 



used is described in Martello and Toth [15]. Two groups were generated: strongly 
correlated and uncorrelated. The strongly correlated data types had values of p^ = 
The uncorrelated data types had values of and which were both uniformly 
random in [1, 100]. The size of the test data was limited to N = 200 and M = 10. The 
reason is that the calculation of the upper bounds in BB required 0{nC) space, where 
C is the size of the knapsack in the surrogate relaxation problem, C grows in size as n 
and m grow, and memory was limited by the development environment. 

From Figure 3, the relative running time of M(x) vs. BB decreases as the number 
of items increases. The same thing does not appear to happen with respect to an 
increase in the number of knapsacks. The relative running times appear to grow 
polynomially (and may decline after m ~ [8,10]) as m increases. In general, as the 
problem size grows, M(x)’s relative running times to BB decline. BB works very well 
on smaller data sets. When the problem size is small, BB performed better than M(x), 
where M(x) had average running times as much as 33 times BB, to get within 1% of 
error. This is due in to the data itself, where a solution within 1% of optimal meant 
the optimum. On larger test data M(x) performed best. In most cases M(250) was 
able to get within 1% of the optimal solution (in many cases the median percent error 
was 0%) and the running times for M(250) were a fraction of BB on the larger data 
sets. 




-M(x) m 
-G(x) m 
-M(x) m 
-G(x) m 



0.50% 1% 5% 

% error 



Percent Running Times oe G(x) 
BB OVER 20 Data Sets with 
3R Constraints. 




Fig. 3. Average Percent Running 
Time oe M(x) vs. BB(-1) on 
Correlated Data Over 20 Data 
Sets to Get within 1% error (log 

SCALE). 



304 Brent A. Smolinski 



There was a noticeable difference in performance between correlated and 
uncorrelated data sets for both M(x) and G(x). With uncorrelated items, M(x) and 
G(x) were generally able to reach good solutions in one round for large n. The 
benefits of sorting the items in descending preciousness are realized with uncorrelated 
items since the most precious items are put into the solution space in the first round, 
which enables the algorithms to immediately identify good solutions. M(x) 
performed better than G(x) on correlated data, especially with tighter error constraints 
(Figure 2.). Test data showed that G(x) is much less likely to converge to a good 
solution within a limited number of rounds. M(x) almost always found good 
solutions in fewer rounds than G(x). 

6 Conclusions and Future Work 

This paper contributes a novel approach to solving the 0-1 multiple knapsack problem 
that is simple, robust, and easy to implement. A model is provided that can guide the 
development of algorithms to solve the 0-1 multiple knapsack problem. One 
particularly effective and efficient algorithm (M(x)) was implemented. M(x) uses 
rounds where agents sell back items, making it effective at finding near optimal 
solutions. This distinguishes it from simple greedy algorithms by allowing the 
algorithm to traverse the solution surface. The use of agents prevents consideration 
of infeasible solutions while traversing the solution space, which allows M(x) to 
traverse sparse solution surfaces effectively. A computationally simple bidding 
strategy was chosen for M(x), thus making the algorithm efficient. The test results 
confirm that M(x) performs very well. M(x) was generally able to find good solutions 
in a fraction of the time of BB on larger data sets, and with much less memory. M(x) 
performed better than G(x) on harder problems and converged to good solutions in 
fewer rounds than G(x). More work can be done in developing different algorithms 
that use the market model. This includes trying new bidding strategies and different 
market mechanisms. Also, developing parallel algorithms based on M(x) could yield 
faster algorithms. Friedman and Oren [9] argue that distributed resource allocation is 
logarithmic in m and linear in n, which suggest that parallelization would be possible. 
It is unlikely that BB can be parallelized since it is a depth-first search algorithm, 
which is inherently sequential [19]. It is also unlikely that G(x) can be parallelized 
since first fit decreasing bin packing is P-Completek 

References 

1. Aly S. (1994). Object- Agents: A New Role of Design Objects in CAD Systems, 
in Pohl, J. (Ed.) Advances in Computer-Based Building Design Systems, 7th 
International Conference on Systems Research, Informatics and Cybernetics, 
Baden-Baden, Germany. 



^ See http://www.i.kyushu-u. ac.jp/~seki/P-complete/all/all.html for a list of P-complete 
problems. 




Approximating the 0-1 Multiple Knapsack Problem 305 



2. Chapman, D. (1987). Planning for Conjunctive Goals. Artificial Intelligence, 
Vol. 32(3). pp. 333-377. 

3. Cheng, J. Q. and M. P. Wellman (1998). The WALRAS Algorithm: A 
Convergent Distributed Implementation of General Equilibrium Outcomes. 
Computational Economics, Vol 12. pp. 1-24. 

4. Clearwater, S. H. (1996). Market Based Control, A Paradigm for Distributed 
Resource Allocation. World Scientific, New Jersey. 

5. Davis, R. and R. G. Smith (1983). Negotiation as a Metaphor for Distributed 
Problem Solving. Artificial Intelligence, Vol. 20. pp. 63-109. 

6. Durfee, E. H. and V.R. Lesser (1989). Negotiating Task Decomposition and 
Allocation Using Partial Global Planning. In Gasser, L. and M. N. Huhns (Eds.), 
Distributed Artificial Intelligence, Vol. II. pp. 229-243. 

7. Eastman, C., S. Chase, and H. Assal (1992). System Architecture for Computer 
Integration of Design and Construction Knowledge. Building Systems 
Integration Symposium, A/E/C Systems, Dallas, TX. 

8. Engelbrecht-Wiggans, M. Shubik and r. M. Stark (Eds.) (1983). Auction, 
Bidding, and contraction: Uses and Theory. New York University Press, New 
York. 

9. Eriedman, E. J. and S. S. Oren (1995). The Complexity of Resource Allocation 
and Price Mechanisms Under Bounded Rationality. Economic Theory, Vol. 6. 
pp. 225-250. 

10. Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization, and 
Machine Learning. 

11. Huberman, B. A. (1988). The Ecology of Computation. Elsevier Science 
Publishers B. V., North-Holland- Amsterdam. 

12. Huberman, B. A. (1995). Distributed Computation as an Economic System. 
Journal of Economic Perspectives, Vol. 9(1). pp. 141-152. 

13. Kirkpatrick, S., C. D. Gelatt, M. P. Vecchi (1983). Optimization by Simulated 
Annealing. Science Vol. 220. pp. 671-680. 

14. Markland, R. E. (1989). Topics in Management Science, Third Edition. John 
Wiley and Sons, Inc., New York. 

15. Martello, S. and P. Toth (1990). Knapsack Problems: Algorithms and Computer 
Implementations. J. Wiley & Sons, cl 990. 

16. Michalewicz, Z. (1992). Genetic Algorithms -i- Data Structures = Evolution 
Programs. Springer- Verlag. 

17. Miller, M. and E. Drexler (1988). Markets and Computation: Agoric Open 
Systems. In Huberman (1988). pp. 133-176. 

18. Pohl, J., L. Myers, A. Chapman, K. Pohl, J. Primrose, and A. Wozniak (1997). 
Decision- Support Systems: Notions, Prototypes, and In-use Applications. TR 
CADRU- 11-97, CAD Research Center, Cal Poly San Luis Obispo, CA. 

19. Reif, J.H. (1985). Depth-first Search is Inherently Sequential. Inf. Process. 
Lett., Vol. 20. pp. 229-234. 

20. Roth, A. E. (1985). Game-Theoretic Models of Bargaining. Cambridge 
University Press, Cambridge. 




306 Brent A. Smolinski 



21. Sandholm, T. (1993). An Implementation of the Contract net Protocol Modeled 
on Marginal Calculations. In Proceedings of the Eleventh National Conference 
on Artificial Intelligence, pp. 256-262. 

22. Scarf, H.E. (1984). The Computation of Equilibrium Prices. In Scarf, H.E. and 
J.B. Shoven (Eds.), Applied General Equilibrium Analysis, pp. 415-492. 
Cambridge University Press, Cambridge. 

23. Steiglitz, K., M. L. Honig, and L. M. Cohen (1996). A Computational Market 
Model Based on Individual Action. In Clearwater (1996). pp. 1-27. 

24. Stonebraker, M., R. Devine, M. Kornacker, W. Litwin, A. Pfeffer, A. Sah, and C. 
Staelin (1994). An Economic Paradigm for Query Processing and Data 
Migration in Mariposa. In Proceedings of the Third International Conference on 
Parallel and Distributed Information Systems, pp. 58-67. Austin, TX. 

25. van Laarhover, P. J. M. and E. H. L. Aarts (1987). Simulated Annealing: 
Theory and Applications. Kluwer Ac. Publ., Dordrecht. 

26. Walras, L. (1954). Elements of Pure Economics. Allen and Unwin. English 
translation by William Jaffe, originally published in 1874. 

27. Waldspurger, C. A., T. Hogg, B. A. Huberman, J. O. Kephart, and W. S. 

Stornetta (1992). Spawn: A Distributed Computational Economy. IEEE 

Transactions on Software Engineering, Vol. 18(2). pp. 103-117. 

28. Wellman, M. P. (1993). A Market-Oriented Programming Environment and its 
Application to Distributed Multicommodity Elow Problems. Journal of Artificial 
Intelligence Research, Vol. 1. pp. 1-23. 

29. Yemini, Y. (1981). Selfish Optimization in Computer Networks. Proceedings of 
the 20* IEEE Conference on Decision and Control, San Diego, pp. 281-285. 

30. Zlotkin, G. and J. S. Rosenschein (1996). Mechanisms for Automated 
Negotiation in State Oriented Domains. Journal of Artificial Intelligence 
Research, Vol. 5. pp. 163-233. 




Design and Development of Autonomous Intelligence 

Smart Sensors 



R. Kollum^’^, R. Loganantharaj^, S. Smith\ P. Bayyapu\ G. LaBauve\ 
James Spencer^, Jeffery Hooker^, Steve Simmons^, T, Hebert^ 

^ Apparel-CIM Center, University of Louisiana at Lafayette 
Lafayette, Louisiana 70504-4932 

^ Center for Advaneed Computer Studies, University of Louisiana at Lafayette 
Lafayette, Louisiana 70504-4330 

^ Intelligent Machine Concepts, LLC, 125 Mallard Street, Suite A 
Saint Rose, Louisiana, 70087 



Abstract. This paper presents the results of an on-going researeh investigation 
being eondueted at the University of Louisiana at Lafayette, in eollaboration 
with its’ industrial partner. Intelligent Maehine Coneepts (IMC). IMC, an 
advaneed teehnology developer manufaetures laser sensors that enable 
automated non-eontaet inspeetion, real-time traeking, and metrology 
applieations. The eentral foeus of the researeh projeet is to enhanee the 
funetionality of this eommereially sueeessful sensor teehnology through the 
development of autonomous intelligenee eapabilities of perception, decision 
and action. This paper presents an overview of the integrated software / 
hardware solution towards the development of smart sensors. 



1. Introduction 

As the society we live in becomes more industrialized, our reliance on automated 
devices and equipment - ranging from security systems for home automation to 
ATMs for automated banking; process control systems in chemical plants to real-time 
control systems in nuclear installations, is ever increasing. Each of these diverse 
applications is characterized by processes controlled by sensors. This research 
addresses the pervasive need for enhanced reliability, robustness, responsiveness, and 
safety - possible by developing sensors with autonomous intelligence capabilities, 
viz., ''smart sensors”. To accomplish this goal, the University of Louisiana at 
Lafayette has established a strategic partnership with Intelligent Machine Concepts, a 
Louisiana-based company that develops sensors for applications ranging from 
manufacturing to metrology. This research builds upon encouraging preliminary 
results as part of an on-going research investigation toward the development of 
intelligent machines. This research, funded by Louisiana Board of Regents and the 
industry, is aimed at the development of a software / hardware control architecture to 
impart autonomous intelligence capabilities to artificially engineered systems, such as 
sensors. A micro-controller realization of the developed software architecture will 
result in the evolution of smart sensors that may be readily integrated with existing 
controllers in a modular, plug-and-play fashion. Embedded control systems are a 
means to this end. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 306-315, 2000. 

© Springer-Verlag Berlin Heidelberg 2000 




Design and Development of Autonomous Intelligenee Smart Sensors 307 



The central focus of this research is thus to impart intelligence capabilities to IMC’s 
sensor systems enabling their enhanced applicability to such industrial manufacturing 
processes as automated non-contact inspection, real-time tracking, and metrology. 
The dynamic and competitive nature of manufacturing in the emerging global 
economy can be characterized by one word: Change. In this volatile market-driven 
environment, enterprise agility is the order of the day. Increasingly, producers of 
goods and services must be able to respond quickly to fleeting market opportunities 
that emerge, peak and then disappear in ever-shortening time frames. Manufacturers 
who desire to survive and thrive in this new competitive environment must therefore 
become more Agile, which by definition entails systems that are flexible and 
dynamically reconfigurable [1]. 

Problem Statement: Flexibility as it pertains to automated manufacturing is a well- 
understood concept as embodied in traditional multifunctional, re-programmable 
robotic systems. However, what does dynamic reconfigurability, so essential to the 
realization of agile manufacturing, entail? In order for manufacturing systems to be 
dynamically reconfigurable, they must be able to rapidly and automatically effect the 
internal modifications needed to accommodate broad product and process variations. 
Unfortunately, flexibility and dynamic reconfigurability of manufacturing systems is 
hindered by the inflexibility of, and the tedious manual set-up required by current 
sensor technologies. 

The paper presents an overview of a solution to this problem that is currently under 
development at the University of Louisiana at Lafayette and Intelligent Machine 
Concepts. This paper is organized as follows: Section 2 presents an outline of the 
existing sensor systems and identifies the limitations of the existing sensor 
technologies. Section 3 outlines the design of the control architecture for the 
development of the smart sensor. Section 4 briefly describes the hardware realization 
of the control architecture, while Section 5 concludes the paper. 

2. The Current Situation 

Deterministic Sensors are the Lowest-Level Impediment to Agile Manufacturing: 
Unlike industrial robots and other computer-based flexible manufacturing equipment, 
the vast majority of commercially available sensors do not lend themselves to rapid 
re-calibration and re-configuration. When a typical sensor must be modified to 
accommodate changing products, processes and ambient conditions, intervention of 
an expert is often necessary in order to re-calibrate and re-deploy the sensor 
successfully. Anyone who has worked with sensors in an automated manufacturing 
system has experienced the frustrations and difficulties of trying to keep track of 
process sensing parameters and sensor capabilities while simultaneously trying to 
optimize performance of the system. This can be an expensive, time-consuming and 
complex endeavor, even for the experienced and skilled technician. The lack of this 
integrated “intelligence” capability in commercially available sensor systems is one of 
the most significant bottlenecks in the realization of agile manufacturing. 

Furthermore, we must also consider that most sensors are essentially dumb 
deterministic detectors or emitter-detectors equipped with discrete signal conditioning 
and conversion electronics. Understanding exactly why a particular sensor is not 




308 R. Kollum et al. 



yielding the desired performance requires a deeper understanding of the physics of the 
sensor than most technicians possess. Additionally, there are numerous subjective 
factors, which must be resolved before a sensor will perform as desired. It is in this 
subjective phase, where experience and expertise are predominant factors driving the 
speed and effectiveness of the setup or reconfiguration of a particular installation. 
This is the “art” side of sensor deployment. Our focus is to evolve this “art” into a 
“science” through the strategic integration of expert knowledge into a framework and 
architecture that supports “smart-sensors.” In this research, we intend to address the 
pervasive need in the industry for smart-sensor systems that enable manufacturing 
flexibility and dynamic reconfigurability. What we propose is to develop a scientific 
framework for capturing our expert knowledge in the area of sensor deployment, and 
integrating that knowledge into the architecture of the sensor. The smart sensors will 
then have the ability to interact with their environment, the robot controller, host 
knowledge/data bases, other computer applications and the human operator. We have 
envisioned, and will develop through this research, a control architecture that will 
enable us to integrate various physical sensors with standard, plug-and-play 
embedded controllers, enabling the development and evolution of autonomous smart 
sensors. It is our view that the development of products with such capabilities will 
have the potential to revolutionize the robotics and automation industry and 
subsequently the manufacturing sector. 

The ultimate goal of this effort is the successful formulation and validation of a 
framework for implementing autonomous intelligent sensors enabling flexible and 
dynamically reconfigurable agile manufacturing enterprises. It is to accomplish this 
goal that the University of Louisiana at Lafayette has established a strategic 
partnership with Intelligent Machine Concepts (IMC), a Louisiana-based company 
that develops sensors for manufacturing applications. 





IMC DynaTrac^^ Sensor Technology: Unlike traditional displacement and offset 
sensors, IMC’s family of DynaTrac^^ non-contact, surface tracking and linear offset 
sensors have been designed and optimized for motion control & robotic surface 
contour following and measurement applications versus the typical use of high- 
precision non-contact sensors used for proximity detection and gauging applications. 
Two of IMC’s commercially proven, mature se nsor technologies are featured below: 



Fig. 1. DynaTrac™ LOS Sensor 



Fig. 2. DynaTrac™ NOS Sensor 



The DynaTrac^^ Linear Offset Sensor (LOS)^ shown in Figure 1, is a high 
performance, high-bandwidth, precision laser-displacement sensor utilized to measure 



Design and Development of Autonomous Intelligenee Smart Sensors 309 



linear offsets from surfaces. Traditional laser-measurement devices tend to offer very 
high precision over a limited operating range. The DynaTrac^^ LOS, however, is 
optimized for dynamic tracking applications. It can be factory customized to meet 
specific user application-requirements, and typically provides a high-bandwidth 
feedback output and offset range tolerance suitable for mechanical servo-control. 

The DynaTrac^^ Normality Offset Sensor (NOS\ shown in Figure 2 enables real- 
time, non-contact determination of planar surface orientation and offset distance at a 
an angular resolution up to 0.001 degrees and a nominal linear resolution of +0.0001”. 
This sensor is robust like the LOS, and can be used with highly specular or diffuse 
surfaces and requires no special targeting fiducials on the area being monitored. The 
sensor can be used to measure and/or track surface orientation in a manufacturing or 
metrology application. It can also be mounted to a robot and used to dynamically 
maintain the orientation of a tool with respect to a surface in a contour following, 
surface processing application. 

What is limiting the widespread and effective utilization of existing sensor 
technologies such as IMC’s sensors is the significant amount of customization, 
development, and set-up needed to optimize performance in the range of 
applications in which the sensors can be deployed. The objective of this research is to 
develop autonomous ^‘smart” sensors, which can be used with commercially 
available, robots, vision systems and other automation components to realize real- 
time, sensor-driven systems capable of real-time response, and dynamic 
reconfigurability. 



3. Design of the Smart Sensor Control Architecture 

To realize the above vision, the smart-sensor must possess the following capabilities: 

Responsiveness: In order for the smart sensor to be responsive to changes in part 
geometry and ambient conditions as it perceives, it must be able to perform 
autonomously. Thus, the smart sensor should have the ability for autonomous 
“perception-decision-action” that minimize the reliance of the sensor on the 
supervisory controller (robot controller) for command, communication, control and 
intelligence (C^I). 

Adaptivity: In order for the smart sensor to interact with a human operator and/or 
expert knowledge bases to enhance it knowledge, it needs to possess the ability to 
learn, as shown in Figure 3. In addition, the smart sensor will need the ability to 
upgrade, enhance and modify its knowledge base, inference logic, and control 
algorithms based on learning from experience and interaction with experts. Further, 
the ability to learn is essential for “sensor reconfigurability” in those situations in 
which the sensor needs to provide an operator the expertise necessary for successful 
recalibration of the sensor. The smart-sensors carry the knowledge encoded into them 
by the expert along with them to the end-user installation. 

Error Detection, Identification and Recovery: Ability to autonomously detect and 
identify errors, and assist in trouble shooting during error recovery. 




310 R. Kollum et al. 




IMC’s Normality Offset Sensor is a modulated laser displacement sensor, which 
maintains normality with respect to the surface under consideration while maintaining 
a desired offset. The sensor module consists of three independent laser emitter and 
detector pairs. The strength of the return signals (voltages) from the three sensors 
determines the distance and orientation of the object with respect to the sensor. The 
sensor mounted on a robot can be used to maintain the tool center point (TCP) at a 
particular pose with respect to the surface. This non-contact sensor may be used for 
three-dimensional curvilinear surface inspection, metrology, real-time tracking and 
contour following in the complete absence of a’ priori model of the object and the 
environment. 



Design and Development of Autonomous Intelligenee Smart Sensors 311 




Smart Sensor 
Controller 



(Zact, %act) 



Fig. 4. Schematic of the system control architecture 

Figure 4 shows the control system architecture of a NOS-driven robotic system. The 
Smart Sensor controller determines the desired position, of the robot tool center 
point at which location, the values of offset and orientation with respect to the surface 
under consideration are estimated to be and 0est, respectively. This is based on the 
logical model of the sensor that resides in the Knowledge base of the controller. These 
values are fed to the robot controller, which drives the manipulator to the desired 
point. The sensor measures the actual values of z and 0, with respect to the surface. 
Based on the feedback from the sensor, {Zact, ^act) the smart sensor controller 
determines any corrective action, if necessary. Once the desired configuration is 
arrived at, the next desired position is determined, enabling the robot to perform 
complex tracking and contour following applications. 




Interface with 
"Robot Controller" 




Fig. 5. Functional scheme of the intelligent eontrol arehiteeture 

The architecture of the “smart sensor”, or any intelligent machine [2], consists of 
three basic modules: Perception, Decision and Action that interact with the 
knowledge-base resident within the system [3], as shown in Figure 5. The term 
knowledge base is used in a loose sense to include different types of knowledge 
representation schemes including episodes of past experience, reactive plans, 
declarative knowledge, compiled knowledge etc. An effective architecture is very 



312 R. Kollum et al. 



much dependent on both the application and the technology selected for implementing 
the architecture. (For example, Alvin, an adaptive software agent capable of 
controlling the CMU NAVLAB vehicle had its perception and execution modules 
fused, and implemented using a back propagation neural network). In our particular 
application, the smart sensor, upon interaction with the environment determines the 
orientation and offset of the robot with respect to the surface, by fusing information 
from the three lasers. Probabilistic models and/or Kalman filters may be implemented 
for multi-sensory data fusion. 

These perceived values are reported to the decision module, which evaluates the 
situation, based upon a comparison of the actual values with the estimated values. A 
discrepancy in these values results in the formulation of a corrective plan. In the event 
of an error, the corrective action could consist of: i.) moving the robot to its’ actual 
desired location, in the event of a robot positioning error; or ii.) re-evaluating the 
estimates of position and orientation, in the event of an error in the system 
Knowledge-base. The Decision module is responsible for problem solving, goal 
selection, and decision making: it formulates the next plan, evaluates the plan, 
calculates a new set of estimated offset and orientation values, and reports the plan to 
the Action module. 

The Action module executes the plan, the outcome of which is monitored and 
evaluated for success or failure. In either instance, the system Knowledge base and 
the component rule-bases get updated based on the success or failure of the corrective 
plan, thereby enhancing the intelligence of the system. In the agent architecture, the 
knowledge base (KB) is assumed to have different types of knowledge namely 
declarative, compiled, episodic, heuristics, compiled plans etc. It is not feasible to 
build such an elaborate knowledge base that makes the agent behave completely 
autonomously. Consequently, machine-learning techniques are used to enrich the 
knowledge base of the system [4-6]. 

4. Design of the Hardware Controller 

The Smart Sensor controller must be open, modular, scaleable, and economical. 
Key attributes of these features consist of the following: 

Open: The controller needs to allow integration of commercial off-the-shelf (COTS) 
hardware and software components into a “de facto” standard environment. The 
proliferation of PCs enables the use of Microsoft Windows operating systems such as 
Windows NT, Windows CE. With the available “open” technologies (IBM PC 
Compatibles), a comprehensive engineering effort is needed to integrate components 
into a functional control system. A critical factor to increase the level of openness is 
the definition of a common set of Application Programming Interfaces (APIs). With 
the availability of common APIs and products conforming to the APIs, it is possible 
for users to reconfigure control systems without major effort, making “plug and play” 
and “scaleability” a reality. 

Modularity: There are times when a module of a control system needs to be replaced 
in order to provide additional capabilities. Modularity defines the ability to replace a 




Design and Development of Autonomous Intelligenee Smart Sensors 3 13 



component or a module in a control system without having to devote a great amount 
of engineering effort to make the control system functional. The end-users would like 
to have the ability to “plug” in a new sensor or device and be able to “play” with 
minimal effort. 

Scaleability: It is necessary to increase or decrease the functionality of the control 
systems based on changes in manufacturing processes or requirements. The 
scaleability of a control system allows control modules to be added or removed from 
the control system and provides appropriate capability to match application needs. 

Economy: It is necessary to minimize the total cost of the system over the life-cycle 
of the control system, as opposed to just the cost of initial purchase. In this context, 
open, modular and scaleable control systems allow incremental upgrades and easy 
integration of components, reducing the cost of modifying a control system. 

The University of Louisiana, Lafayette is one of the select few Universities in the 
U.S. that is serves as a member of the Open Modular Architecture Controls (OMAC) 
Users Group [7]. The OMAC was formed to create an organization through which 
companies could work together to: i.) Establish a repository of open architecture 
control requirements and operating experience from users, software developers, 
hardware builders and OEMs, ii.) Facilitate accelerated convergence of industry and 
government developed APIs (Application Program Interfaces) to a set that satisfies 
common requirements, iii.) Promote open architecture control development among 
control builders, and iv.) Derive common solutions collectively for the development, 
implementation, and commercialization of open architecture control technologies. 
Being a member of this group, the University aims to bring standardization and 
openness to emerging control technologies, as evident in the design of the sensor 
control architecture. 

Standard interfaces (common APIs) for various control software modules, device 
level networks, etc will be developed. The API of each module defines 
communication with other modules within the controller. With the use of the 
appropriate API, it is feasible to replace the existing DSP within a functional control 
system with a newer, faster DSP and the total system should be functional again with 
simple integration and validation. The common API enables not only the simple “plug 
and play” requirement, but also the scaleability of the controller. Modules may be 
added or removed from the control system based on the application. 

Preliminary design of the hardware control system consists of the following 
components: 

Embedded Controllers: The STD 32® Bus combines small, industrial strength 
architecture with the functionality and performance of high-end personal computers. 
This versatile 32-bit scalable computer is suited for demanding real-time control and 
data acquisition applications where system size and cost are important priorities. PC 
software capability and a large supply of industrial I/O make STD 32 an ideal choice 
for embedded industrial applications. 

• ZT 8908: Ziatech's new ZT 8908 SBC with Pentium Processor is designed for 
industrial automation applications requiring Pentium processor-level performance 




314 R. Kollum et al. 



in a small, reliable format. With a PC/AT peripheral set, IDE drive and local bus 
PCI options, plus the capability to run PC operating systems and software, this 
new SBC combines the functionality of a PC with the embedded features needed 
for industrial automation. 

Operating Systems: The QNX is a Unix-like operating system for use in demanding 
control applications where real-time multi-tasking performance is an important 
criterion. Combining QNX features with Ziatech STD32 hardware results in a rugged, 
compact, reliable solution for embedded systems. 

• QNX Neutrino Real-time Operating System : The Neutrino microkernel delivers 
core real-time services for embedded applications, including message passing, 
POSIX thread services, mutexes, condition variables, semaphores, signals, and 
scheduling. It can be extended to support POSIX message queues, file systems, 
networking, and other OS-level capabilities with simple plug-in, service- 
providing modules. 

Control Networks: LonWorks Control Modules reduce the time and development 
costs for developing LonWorks nodes. Echelon's control modules include a Neuron 
Chip, Echelon transceiver, memory, and clock oscillator in one compact module. 

• XF-1250 LonBuilder Developer Kit: : The LonBuilder Developer's Workbench 
integrates a complete set of tools for developing LonWorks based nodes and 
systems. These tools include an environment for developing and debugging 
applications at multiple nodes, a network manager to install and configure these 
nodes, and a protocol analyzer to examine network traffic to ensure adequate 
network capacity. 

An extensive study of the state-of-the-art in real-time operating systems, 
programming languages, hardware platforms / bus architectures, and communication 
interfaces is currently being performed. Some of the specific product technologies 
that are under investigation include, but are not limited to: 

• Operating systems: WindowsCE, QNX, VxWorks; 

• Programming Languages: C++, Java, Parallel C; 

• CPU Hardware Platform / Bus Architectures: Standard interfaces consisting 
of the common computer backplanes such as ISA (Industry Standard 
Architecture), VME (VERSA Module Eurocard), PCI (Peripheral Component 
Interconnect), etc.; 

• Communication Interfaces: IEEE P1451.2 Standard for Smart Transducer 
Interface for Sensors and Actuators, LonWorks, Ethernet, Controller Area 
Network solutions. 



5. Conclusions 

This research builds upon the success of Intelligent Machine Concepts’ non-contact 
sensor technologies and UL Lafayette’s capabilities in intelligent control. IMC’s 
DynaTrac™ family of laser based measurement and tracking sensors enable industrial 




Design and Development of Autonomous Intelligenee Smart Sensors 3 15 



robots to perform accurate, non-contact, real-time, three-dimensional tracking and 
measurement of complex curvilinear surfaces. 

The central objective of the research is to develop a framework for realizing 
autonomous “perception-decision-action” capabilities for IMC’s sensors. A prototype 
“smart sensor” that uses a neuro-fuzzy controller is currently under development. The 
next step is to develop, integrate, and demonstrate smart sensors with robots and other 
manufacturing equipment to enable workcell level agile manufacturing. 



References 

1. Goldman, S., R. Nagel, and K. Preiss, ''Agile Competitors and Virtual 
Organizations'', New York: Van Nostrand Reinhold, 1995. 

2. Saridis, G.N., "Analytic Formulation of the Principle of Increasing Precision 
with Decreasing Intelligence for Intelligent Machines", Automatica, Vol. 25, 
No. 3, pp. 461-467, 1989. 

3. Albus, J., "Outline of a Theory of Intelligence" , IEEE Transactions on Systems, 
Man and Cybernetics, vol. 21, no. 3, pp. 473-509, May 1991. 

4. Pang, G., "A Framework for Intelligent Control", JIRS, Vol. 4, pp. 109-127, 
1991. 

5. Antsaklis, P., "Defining Intelligent Control", Report of the Task Force on 
Intelligent Control, IEEE Control Systems Magazine, pp. 4-5 & 58-66, June 
1994. 

6. Tzafestas, S.G., "Knowledge-Based System Diagnosis, Supervision & Control", 
Plenum Press, 1989. 

7. -, Open Modular Architecture Controls Group, http://www.arcweb.com/omac. 




ADDGEO: An Intelligent Agent to Assist Geologist 
Finding Petroleum in Offshore Lands 



Ana Cristina Bicharra Garcia, Paula Marisa Maciel, Inhauma Neves Ferraz 

Institute de Computa^ao 
Universidade Federal Fluminense - Brazil 
Rua. Passo da Patria, 156/sala 326/BLE 
Niteroi - RJ - Brasil 

{bicharra, paula, f erraz }@dcc . ic . uf f . br 
http : / /www . addlabs .uff.br 



Abstract. In this paper, we present an intelligent agent (ADDGEO) that assists 
geologists identifying rocks constituents during thin section analysis. ADDGEO 
is a hybrid tool using both a knowledge base and a neural net to recognize 
existent visual patterns in thin sections alone or with the user participation. 
ADDGEO was recently deployed in a Brazilian oil company presenting benefits 
to improve the geologists task completion. In addition, it has shown a potential 
use as a training tool. 



1 Introduction 

Most Brazilian oil reservoir lays on offshore areas. For this reason, a great effort at 
Brazilian oil companies has been focused on technologies to make feasible finding 
and exploring these rich and complex oil fields. 

The study to identify the potential oil places is very expensive per se because it 
involves getting samples of undersea rocks, performing sound and nuclear 
experiments. Although obtaining the data to analyze the field is very expensive, a 
wrong or even misleading diagnosis may be economically catastrophic. From the 
sample rocks, geologists produce thin sections with relevant soil material to be 
interpreted using special microscopes. The task is to identify soil components, oil 
existence probability and amount, depositional environment, rock permeability and 
rock extraction difficulties. In summary, geologists, based on a visual analysis, 
diagnose the potential amount of oil and the difficulty to extract it in a specific field. 
In addition to the visual interpretation, other methods are used to assist this task such 
as electric, electromagnetic, seism, and spectrometric Alfa experiments to ratify or 
rectify the visual hypotheses. 

The visual thin section analysis consists of a very effective method, but subjective and 
dependent on the expert’s point of view. In addition to being argued by its 
subjectivity, the thin section analysis method is in danger due to the limited number of 
experts in the area. The company’s expertise availability in conjunction with the need 
to maintain this know-how “in house” created a good opportunity to build an 
intelligent assistant agent based on the knowledge systems and neural networks 
technologies. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 316-321, 2000. 
© Springer- Ver lag Berlin Heidelberg 2000 




ADDGEO: An Agent to Assist Geologist Finding Petroleum in Offshore Lands 



317 



This paper presents a hybrid system (symbolic and connectionist) based on the ADD 
model [1]. We also presented a prototype system applied to carbonate rocks 
interpretation on Brazilian offshore areas. Initial results have shown efficient assisting 
novices to recognize rock samples components. 



2 Describing the Task 

To study the potential of an oil field, many wheels are made to gather material for 
analysis. Thin sections are prepared containing relevant sample material taken from 
the exploration wheels. The prepared thin sections are sent to experts to interpret 
them. The expert’s task consists of identifying the: 

- Type of grain non bioclasto; 

- Existence, percentage and type of grain bioclasto; 

- Rock porosity and permeability percentages; 

- Cementation, fragmentation, compaction, neomorphism and other diagenetic 
events; and 

- Rock depositional environment. 

Experts work observing thin sections on a microscope and recognizing visual 
patterns. They fill a standard form report with the results of their analyses. Their work 
is done individually or in group. 



3 The ADDGEO Model 

After extensive interviews with experts in the company, we realized the identification 
process was subjective and complex, but feasible to be partially automated. Erom the 
interviews and company’s documentation we created a domain knowledge base (KB) 
to assist the diagnosis task. We implemented an agent that reasons using this KB. It 
observes the user’s action and may offer help. It can also answer user’s requests for 
guidance during the interpretation task. Due to ontological problems when 
interpreting visual patterns, symbolic assistance may be misleading. A neural net was 
included as an additional ADD agent’s feature to identify bioclastos (life forms). Eig. 
I illustrates ADDGEO model. 

There are two ways to interact with the system: automatic diagnosis and cooperative 
diagnosis. In the first, the system identifies by itself the rock components through a 
three layers neural network. In the other way, users answer questions concerning the 
visual features displayed in the thin section image. Based on the answers, the system 
diagnosis rock components and its historical deposition. The neural net hint is very 
useful when users have difficulties perceiving the image. The symbolic approach 
assists users not only to achieve an interpretation, but also to learn the process. A 
contextual help is works as a tutorial teaching users the meaning of the terms and the 
logical inferences the system is able to do. 




318 



Ana Cristina Bicharra Garcia et al. 




Fig. 1. ADDGEO model. 



3.1 Describing the Knowledge Base 

The knowledge in ADDGEO is represented through dependency parametric networks. 
Nodes are classification parameters and links are causal relations. In summary, we 
used a typical classification model. The entire net contains over 800 parameters. 

Since users are not obliged to provide all input data, the model must consider the 
existence of unknown values in its inference rules, even with none provided 
information. If the system does not reach a conclusion, it shows the users all possible 
hypotheses with visual samples from its database. 

Each well succeeded analysis feeds the system’s case data base. The knowledge base 
grows with usage. As the case database grows, the system learns a bit more (route 
learning). Erom time to time, a data mining or a learning procedure extracts new rules 
from the recorded cases. Therefore, knowledge doesn’t die. 



3.2 Describing the Data Base 

ADDGEO’ s database are the soul of the system. Even though it is just to support the 
diagnosis task, it is fundamental to show credibility and to act as the user’s extended 
memory. There are three key databases: 

- Image database, with digitized thin section photos from all cataloged Brazilian oil 
fields; 

- Grain classification database, with the taxonomies and meaning of all identifiable 
grains; and 

- Thin sections attributes database, with the attributes that maps a sample to a rock. 
We built a friendly user interface to interact with these databases. It is a menu based 
interface that shows users a created version of the query in natural language. 









ADDGEO: An Agent to Assist Geologist Finding Petroleum in Offshore Lands 



319 



3.3 The Neural Net 

Neural networks traditionally were applied to classification tasks in simple or linear 
(one dimension) environments. In 1989 Decatur [2] used neural networks to terrain 
classification task for terrain radar images. His attempt was a turning point because 
before this work the preference were directed classical statistical methods (bayesian 
and nearest-neighbor classifiers). Since 1990 various researchers have presented cases 
of remotely sensed images recognition. Roli, Serpico and Vernazza [3]described the 
evolution of these attempts. The back propagation paradigm was the preferred one. In 
1993 Hwang et al[4]. Introduced the radial basis function neural network in which the 
training time is substantially reduced. 

The common feature of all these models is the decomposition of the image in a set of 
different channels, one channel for each sensor and a separate treatment of the 
channel outputs. Multi layer Perceptrons, according Freeman [5], analyzed each 
pattern and, in some cases, the output of the channels is combined in a tree-like 
architecture ending in a majority decoder. 

Comparing with the regularity of the plantations, lakes, rivers and roads detected by 
the remote sensing the “bioclastos” images show the ambition of the ADDGEO 
project. This could be done because the neural network is used for hypothesis 
generation to be confirmed by the built-in expert system of ADDGEO. 

In ADDGEO the task was to recognize large bitmaps with ten to fifty thousand 24 bits 
pixels. And more the images could come from different sources with different 
resolution and representation (jp^g^ bmp, tiff, etc.). 

These aspects increase substantially the complexity of the neural network task. In the 
current version the chosen earning rule and paradigm was the back propagation 
The most part of the effort was direct to the image filtering in order to transform 
bitmaps of thousands of pixels in treatable sets of input data. These goals were 
achieved using mathematical morphology methods as color segmentation, edge 
detection and majority code simplification. 

The neural network was a three-layer network and we used the Neuralware product 
[6]. The input layer was fed by the simplification applied to the filtered bitmap. 

In the end of simplification task the input data was composed of 625 bits used as input 
data layer. 

The geologists have determined 41 kinds of “bioclatos” to recognize. These classes 
were codified in the binary method using six bits to the output data layer. 

The one and only hidden layer were composed by four elements. The learning cycle 
involved 100.000 rounds. The output layer content was interpreted as an index for a 
table entry. This table has the characteristics of the desired “bioclasto”. 

This task is a seven steps sequence: 

1. The user selects a “bioclasto” using the ordinary selection method (fencing the 
“bioclasto” in a rectangular boundary). 

2. The user pushes a button to indicate that he (or she) wants to recognize a pattern. 

3. The system applies a sequence of filters to the selected image finding a probable 
boundary. 

4. The probable boundary is simplified and transformed in a bit vectors. 

5. The bit vector is used as a neural network input in the pattern recognition task. 

6. The neural network processing results in a 6-bit number that is used as a table 
index. 




320 



Ana Cristina Bicharra Garcia et al. 



7. The table entry obtained is showed to the user identifying the inferred “bioclasto” 
and exhibiting various images of the same family of the “bioclasto” for reinforce 
the confidence of the user. 



Using ADDGEO to Identify Rock’s Elements in a 
Carbonate Rock’s Thin Sections 



Suppose a geologist receives a set of thin sections to analyze, interpret and generate a 
diagnosis on the rock represented by the samples. Using the main interface, a new 
project (or a new case study) is created including all images of the rock thin section. 
The images remain available in the left upper part of the interface (Studying Area) as 
showed in Fig. 2. 



Main menu 



Image study 
toolkit 



Work^ 

Area 






Pnwiogm 








■ijjr. 




BiCdUSIZ 






■ --1 :*! 




“A 1 






^Data 



Input Area 

Fig. 2. ADDGEO system main interface screen dump. 



Neural 

network y , 
Diagnosis 
Area 



The user inputs the thin section general data coming from the main geological 
laboratory, filling out the form presented in the left bottom area display (Data Input 
Area). 




ADDGEO: An Agent to Assist Geologist Finding Petroleum in Offshore Lands 



321 



As soon as the general data have been provided to the system, the user can create 
alternative diagnosis in the right lower part of the display (Diagnosis Area). It is 
important to note that there are no mandatory fields and the user can omit any field. 
The user can create as many alternative diagnoses as he considers necessary and to 
release only the relevant ones to the Corporate Intranet. The diagnosis can be done 
completely by the user or with system’s help. If the user was able identify the grains 
he does not need system help. However if the user wants some help he presses the 
ADD button and waits for system directives. 

The system, based on the user’s provided information, tries to identify the context of a 
requested help. If the information is insufficient, the system interacts with users (in a 
question and answer mode) until it reaches a conclusion. The system-user interaction 
takes place in the interface right upper area (Working Area). A neural network can 
also help the identification process. A special button triggers the image automatic 
recognition by the net. Due to image complexity, the recognition precision has been 
low using the neural net. 



5 Conclusion 

We presented a hybrid system (symbolic and neural) to help highly complex images 
recognition. Our model was applied to off shore carbonated rocks. Although the 
system have changed the way geologists execute the tasks, it has been well accepted 
due to task completion time reduction, diagnosis task standardization inside the 
company and the speed up on new geologists learning to accomplish the task. 



6 References 

1. Garcia, A. C. B. - Active Design Documents: A New Approach for Supporting 
Documentation in Preliminary Routine Design - Ph. D. Thesis - Civil Engineering 
Department - Stanford University - 1992. 

2. Decatur, S. E. - Applications of Neural Networks to Terrain Classifications - Proceedings 
of the International Joint Conference on Neural Networks 89 - Washington - 1989. 

3. Roll, F. , Serpico, S. and Vernazza, G. - - Neural Networks for Classification od Remotely 
Sensed Images in Fuzzy Logic and Neural Networks Handbook - McGraw Hill Inc. - New 
York - 1996. 

4. Hwang, J. N. , Lay, S. R. and Kiang, R. - Robust Construction Neural Networks for 
Classification on Remotely Sensed Data - - Proceedings of World Congress on Neural 
Networks 93 - Portland - 1993. 

5. Freeman, J. A. e Skapura, D. M. - Neural Networks Algorithms, Applications and 
Programming Techniques - Addison-Wesley Publishing Company - Reading - 1991 

6. Klimasausker, Casimir C. e Guiner, John P. Neuralworks - Neuralware Inc.- Pitsburgh - 
1988. 




SOMulANT : Organizing Information 
Using Multiple Agents 



Tim Hendtlass 

Center for Intelligent Systems and Complex Proeesses 
Sehool of Biophysieal Seienees and Eleetrieal Engineering 
Swinburne University of Teehnology 
PO Box 218, Hawthorn, Australia. 3122 
thendtlass@swin . edu . au 



Abstract. This paper introduees an algorithm, inspired by ants, that allows 
information to be sorted onto a map. The agents that earry out the mapping 
have vision and an ability to measure the loeal disorder at points on the map. 
They use these to deeide when to lift information and where and when to plaee 
it down. Results of sorting a well-known data set, the iris data set, are 
presented and the importanee of various parameters diseussed. 

Keywords. Colleetive intelligenee, self organizing map, information mapping, 
multiple agents. 



Introduction 



Sorting information into appropriate fuzzy categories is a difficult yet fundamental 
and frequent activity. Often the information is mapped, typically to a lower number 
of dimensions, so that the spacing between pieces of information reflects the strength 
of one or more relationships between them. A map built by a process that is partly 
stochastic and partly procedural map may at times reveal previously unrecognized 
relationships within the information. Self-organizing artificial neural networks have 
been used for some years for this purpose [see, for example, 1]. 

In nature ants are known to be efficient at sorting, for example sorting larvae as 
they mature. Models of how they do this have been applied to sorting information, to 
see if an ant-like efficiency can be reproduced. The term agent will be used for the 
rest of this paper rather than ant since the resulting algorithms are only inspired by 
ants and make no attempt to replicate the actual characteristics of any type of ant. 

Deneubourg et al [2] proposed a model for this behaviour in which randomly 
moving unloaded agents pick up items of a particular type with a probability of 




and loaded agents drop what they are carrying with a probability of 



Pd = 




2 



( 1 ) 

( 2 ) 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 322-327, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




SOMulANT: Organizing Information Using Multiple Agents 323 



where kj and ^2 are constants and / is the average number of items encountered in 
some time period. This algorithm allows you to sort clearly distinguished classes of 
items so that each type appears in a different place. A variant of this algorithm 
proposed by Oprisan et al [3] discounts the influence of previously encountered 
objects by a time factor so that recently encountered objects have more weight than 
objects encountered long ago. This is to limit the size of object piles. 

Turner and Faieta [4] have generalised this algorithm to allow for more complex 
sets of objects with A attributes in which various types may have degrees of similarity 
with each other. The algorithm projects the A dimensional space onto a lower 
(usually two) dimensioned grid in such a way as to provide intracluster distances 
smaller than intercluster distances while also preserving some of the interobject 
relationships that existed in the original dimension space. They define a local density 
function f(oi) of an object O, within a square of size S. Let L be the list of all objects 
with that square, if L is empty /(oj is zero. If not, 






d(o^,Oj) 



a 



(3) 



where d(Oi,Oj) is the Euclidean distance between objects Of and Oj in the original 
space. 

This paper extends this work in two ways to produce the SOMulANT algorithm. 
Firstly the agents have vision and, for any cell within their sight, can measure the 
local disorder index. This is similar to Turner and Faieta’ s local density function, 
except that the further Oi is from Oj the less influence it has. Secondly, the agents 
move to some degree in a directed fashion. The probability that an agent continues to 
move in its previous direction is known as the momentum of the agent. Agents that 
do not continue to move in this way move towards the area within their sight in which 
their load will best fit (if loaded) or to the area of highest local disorder within their 
sight (if unloaded). 



The SOMulANT Algorithm 



The algorithm is built on the concept of a measure of local disorder. The value of this 
for any location reflects the distribution and variation of all the data examples visible 
to the agent, currently in the small region R, of radius Oradius^ centered on the agent’s 
position. While this measure can be defined for a map with any number of 
dimensions, the rest of this paper concentrates on two-dimensional maps. 

Let Disti xy be the distance of data example i from the center x,y of the region R and 
let Diffij be the difference between data example j located at x,y and another example i 
located somewhere within the region R. The local disorder is then: 



I S 

all all 

examples examples 
j at xy i within 
region R 



V+m,] 



(4) 



An agent has a limited vision range. Within this range the agent can measure the 




324 Tim Hendtlass 



local disorder at any point. If unloaded it has a certain probability (controlled by the 
agent’s momentum) of continuing along its current path. If it does not continue it 
looks for the region with the highest local disorder visible to it and heads in that 
direction. At each step it may pick up an example from underneath it. The 
probability that it does so is proportional to how much doing this would decrease the 
local disorder of the agent’s current location. 

An agent can only carry one data example at a time. When loaded there is again a 
certain probability (controlled by the agent’s momentum) the agent will continue 
along its current path. However, now if it does not continue it looks for the region it 
can see whose local disorder would be least increased if the example it carries were to 
be dropped there and heads in that direction. At each step it may drop the example it 
carries with a probability proportional to the increase this would cause in the local 
disorder. 

Let the agent be at position x,y and let E be the example that the agent is either 
considering picking up or dropping. Let LD^thE is the local disorder with example E 
dropped at x,y and LD^thoutE be the local disorder with example E removed from x,y. 

The probability that example E will be picked up is: 



LD — LD 

P _ withE 

rpu — ix 



withoutE 



l + LD.. 



■LD.. 



if , 0 otherwise. 



(5) 



withE withoutE 

The probability that example E would be dropped at x,y is: 



^ ^ T D 

^^withE 

The parameters a and j5 in the equations above are a measure of how sensitive the 
agent is to the local disorder. With each step during which an unloaded agent does 
not pick up a load, the value of a for that agent may be increased slightly, as soon as 
it loads a is returned to its original value. Similarly, with each step during which a 
loaded agent does not drop its load, the value of P for that agent may be increased 
slightly. As soon as it loads P is returned to its original value. This can be useful in 
allowing probabilistic map perturbations that may minimize any tendency of the map 
to settle in sub-optimal configurations. The full algorithm is: 

Deposit examples in a pile at the center of the map space. Randomly distribute 
agents across map space each with a random direction of travel. 

Repeat For each agent in turn: 

If the agent is unloaded: 

Select a random number. If less than probability agent will continue in its 
current direction, record agent movement as one unit in its current direction. 
If not, find highest local disorder position within the range of vision of agent. 
If first step on the way there is unoccupied move agent, else leave the agent 
where it was but assign a new random direction. If the agent is over at least 
one example, calculate the highest pickup probability Ppu for the data 
examples under the agent. Choose a random number and pick up that 
example if the random number is less than this probability. 




SOMulANT: Organizing Information Using Multiple Agents 325 



If the agent is loaded: 

Select a random number. If less than probability agent will continue in its 
current direction, record agent movement as one unit in its current direction. 
If not, find the map position within the range of vision of this agent whose 
local disorder would be least increased if the data example currently carried 
to be dropped there. If first step on the way there is unoccupied move the 
agent, else leave the agent where it is but change to a new random direction. 
Calculate the probability Po that the agent drops the data example it is 
carrying. Choose a random number and drop the data example being 
carried if the random number is less than this probability. 

If the agent either loaded or unloaded this move reset this agent's sensitivity 
to the initial value, else increase this agent ’s sensitivity. 

Until a suitable map has emerged. 

Methodology 

The test problem reported here is the sorting of the iris data set [5]. The data set 
contains examples of iris flowers of varities setosa, versicolor or virginica, giving the 
sepal length and width, petal length and width and class for each. Experience has 
shown that, given this information, it is relatively simple to identify all examples of 
the setosa class, but that some examples of the other two classes are hard to tell apart. 
The agents were required to sort these classes onto a 10 x 10 grid, the minimum size 
on which a regular self-organising map algorithm is reliably able to sort them. Rather 
than randomly distribute the objects onto the grid the problem was made more 
difficult by piling all the objects on top of each other in the very center of the grid. 

The agents were randomly positioned and the algorithm above was executed until 
completion was achieved. Completion required that no grid cell contain objects of 
more than one class and for five algorithm cycles to have passed during which no 
agent carried any item. 



Results and Discussion 



IRIS TRAIN DATA 




















■ 


■ 
















■ 




■ 


■ 


■ 














■ 


■ 
















■ 








♦ 














■ 


♦ 








♦ 




♦ Setosa 
■Versicolor 
Virginica 










■ 


♦ 






♦ 


♦ 














♦ 






♦ 






■ 




■ 


■ 






♦ 


♦ 




♦ 




■ 




■ 


■ 






♦ 


♦ 


♦ 


♦ 


■ 


■ 


■ 




♦ 




♦ 


♦ 


♦ 


♦ 









Fig. 1. A typical map of the iris 
data produced by the SOMulANT 
alfrorithm 



Figure one shows a typical map 
produced. In this case the radius over 
which disorder was measured was 
three grid cells, the value of 
momentum was 0.25, and a and yS 
were both 0.3. Note that the classes 
versicolor and virginica overlap 
slightly — as might be expected from 




326 Tim Hendtlass 



the results obtained from building self organizing maps using this data. This is typical 
of the completed maps built over a wide range of parameter values. 



Momentum (and resetting the direction of blocked agents) is required else a jam 
may occur especially during the early stage of map evolution. This is exacerbated if a 
large number of agents are used. The results described here were obtained with only 
five agents. 



Distribution of object movement sizes 

300 1 



2 « 250 

•S I 200 




Movement size (cells) 



Q alpha 0.5 beta 0.1 * alpha 0.3 beta 0.1 ^ alpha 0.3 beta 0.3 ^ alpha 0.3 beta 0.2 



Fig. 2. The variation 
in the distribution of the 
size of movements of 
information as a function 
of alpha and beta. In all 
cases the vision was set 
to 10 cells. The agent’s 
sensitivity modification 
factor was 1.01 




Fig. 3. Algorithm 
cycles to map 
completion times and 
the total number of 
loaded and unloaded 
agent steps as a 
function of the values 
of alpha and beta. 



The majority of objects 
were only carried for a 
single cell before being 
dropped. As shown in figures two and three, changing the value of a to be higher 
than the one and a half times the value of {5 increased both the average length of carry 

and the total number of 
cycles through the algorithm 
required to complete the 
map. Increasing the value 
of a up to one and a half 



Algorithm cycles to completion of a three cluster 
map 



30000 

25000 

20000 

15000 

10000 

5000 

0 



5 cells /cells 10 cells 

Vision range 



14 cells 



5-15 cells 



Fig. 4. Map 
completion time as a 
function of the agent’s 
range of vision. For 
all maps alpha = 0.3 
and beta = 0.2. 




SOMulANT: Organizing Information Using Multiple Agents 327 



times the value of p proved beneficial. As shown in figure four above the longer the 
range of vision of the agents, the shorter time it took to complete a three cluster map. 

A vision range less than one half of the map dimensions (5 cells in the case of the 
maps being discussed here) resulted in maps with more clusters than necessary. For a 
vision range of 4 cells the maps were, however, complete in that no cell contained 
more than one type of object, definite clustering was observed and for a number of 
algorithm cycles no agent carried any object. Eventually the increases in a from 
being unloaded would result in the agents starting to modify the map again. For a 
vision range of 5 cells a three-cluster map was eventually produced after a number of 
completed maps with more than three clusters had been produced. 

For the vision ranges in figure four, maps with three clusters would be produced, 
the longer the vision range the smaller the total number of cycles through the 
algorithm required to produce the map. Giving agents different visual ranges between 
5 and 14 cells while building a map did not seem to change the resulting map but did 
increase the total number of cycles through the algorithm required to produce it. 

As the number of cells an agent could see with its visual range increased, the time 
taken to complete one algorithm cycle also increased as the agent had more 
possibilities to consider before moving. As a result the visual range to produce the 
map in the shortest time was not necessarily the same as the vision range to produce 
the map in the smallest number of algorithm cycles. 

Conclusion 

The SOMulANT algorithm has been effective at mapping a range of relatively simple 
data sets in addition to the data set described in this paper. Unlike a conventional 
self-organising map it does not need a conscience applied to ensure that information is 
spread across the map. The inherently lower disorder of boundary cells ensures that 
information is spread right to the edge. Agent visual range is the main factor in 
deciding the map ordering scale, while the values of a and largely determine the 
number of cycles through the algorithm needed to produce a completed map. 

References 

1 Kohonen T. ‘Self-Organisiation and Associative Memory’ Springer- Verlag New 
York. 

2 Deneubourg, J-L, Goss S, Franks N, Sendova-Franks A, Detrain C and Chretien 
L. ‘The Dynamics of Collective Sorting: Robot-Like Ant and Ant-Like Robot.’ 
In Proceedings First European Conference on Simulation of Adaptive 
Behaviour: from Animals to Animats, edited FJ Varela and P Bourgine, pages 
123-133. Cambridge MA MIT press 1992. 

3 Oprisan S.A. Holban V, and Moldoveanu B, ‘Functional Self-Organization 
Performing Wide-Sense Stochastic Processes’ Phys Lett. A2\6 303-306 1996. 

4 Lumer E and Faieta B, ’Diversity and Adaption in Populations of Clustering 
Ants’. In Proceedings Third International Conference on Simulation of 
Adaptive Behaviour: from Animals to Animats, pages 499-508, Cambridge MA 
MIT Press 1994. 

5 Batchelor B.G.. ‘Practical Approach to Pattern Recognition’. Plenum Press, 
New York 1974. 




Inventiveness as Belief Revision and a Heuristic 
Rule of Inventive Design 



Y. B. Karasik 
Nortel Networks Inc. 

P.O. Box 3511, Station C, Ottawa, KlY 4H7, Canada 
yevgeny@nortelnetworks . com 



Abstract. The notion of inventive belief revision, as opposition to the 
notion of trade-off heliei revision, is introduced. A heuristic rule of iden- 
tifying the beliefs to be revised, is proposed. It is shown that some well 
known inventions might be arrived at with the help of this rule, which 
means that the rule can be used as a powerful tool in computer aided 
inventive design. 

Keywords: belief revision, innovation modeling, computer aided inven- 
tive design, logics in artificial intelligence. 



1 Introduction 

As is known, design of any system usually starts at the high level design fol- 
lowed by detailed design consisting of a number of iterrations, each of which 
refines the previous one. During high level design, a designer may set objec- 
tives Oi, O2, • • • , On, none of which seem to be contradictory at this stage. 
However, in the course of subsequent refinements, the designer may freely or 
otherwise make assumptions ai,a2, • • • ^am about ways of achieving the objec- 
tives and/or face some natural constraints ci, C2, • • • , c/^, in view of which the 
objectives may become contradictory. 

For example, suppose that one is going to design an aircraft capable to 
carry 200 passengers (objective 0 \) a distance of 10,000 km (objective O2) 
without a landing (objective O3) at a speed of 700 km/h (objective O4). These 
objectives are obviously not contradictory in themselves. However, if one de- 
cides to equip the aircraft with gas turbine engines (assumption ai), then it 
turns out that in order to achive the objectives, the engines’ propellers have to 
be 9 meters long. In order to accomadate these huge propellers, the aircraft’s 
undercarriage legs must be inadmissibly long and, therefore, too weak, which 
makes landing and taking off impossible. Thus, the assumption a\ makes the 
objectives Oi,02,Os, and O4 contradictory. 

When a designer encounters such a contradiction, he may choose to compro- 
mise some objectives. It is called trade-off design. In the above case, the trade-off 
design could result in compromising either speed, the distance of flight without 
landing, or the desired payload. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 328-332, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Inventiveness as Belief Revision 



329 



Trade-off design fits well into the paradigm of the AGM belief revision [2]. 
Indeed, the designer has a specification (spec) on the system to be designed. The 
spec includes objectives and assumptions. The logical inferences coupled with the 
required calculations reveal an inconsistency in the spec, and the designer needs 
to revise it in order to make it consistent. 

However, besides trade-off design there is also inventive design in which de- 
signer finds a solution without compromising objectives in spite of contradic- 
tions [1]. For example, in the above case with aircraft, a possible inventive so- 
lution is to equip the engine with two small coaxial propellers rotating in the 
opposite directions rather than to equip it with one 9 m long propeller. Just 
this solution was implemented on some gas turbine airliners in the 1950s. This 
invention allowed the designers to achive objectives Oi, O2, O3, and O4 without 
compromising. 

In a sense, it is also belief revision, namely revision of the assumption that 
an engine has one propeller. However, unlike assumption ai, this assumption 
has not been made explicitly. It was a tacit unrealizable assumption taken by 
the designers for granted. The number of such unrealizable assumptions is astro- 
nomically big in any design and even a very thorough specification on a system 
cannot list all of them. 

As far as I know, no work has been done on belief revision when only a 
portion of the belief system is realizable. In other words, when the database 
does not contain all the pieces of information and there are hidden pieces which 
are encripted in the rules of deriving inferences from the database. Moreover, 
to the best of my knowledge, no other rules of reasoning distinct from purely 
logical inferences have been taken into account in works on belief revision. 

I call inventive belief revision such belief revision where only some pieces 
of information are explicitly represented in the database and most pieces are 
implicitly hidden in the rules of deriving consequences, which besides logical 
inferences may include calculations and other procedures. 

In this paper, I propose some rules of inventive belief revision and show how 
they could be used to solve challenging inventive problems of the past. 

2 Resolving Contradictions by Separating Contradictory 
Requirement s 

One of the major problems in belief revision is how to find that particular belief, 
which should be retracted in view of the discovered inconsistency in the belief 
system. Even when the belief system is finite and all beliefs are explicitly stated, 
it is a difficult problem because usually there is no unique solution: the consis- 
tency can be restored by retracting one of many possible pieces of information. 
Most publications on belief revision study the latter problem of making a choice 
amongst equally possible revisions rather than the problem of identifying all 
possible revisions. For finite belief systems, this is justified because the set of all 
possible revisions can be computed either way (say, by a brute force algorithm) . 



330 



Y.B. Karasik 



On the contrary, in the inventive belief revision, the main problem is not 
that of choice but of identifying some hidden assumptions, alteration of which 
leads to a solution. That is why, I am focusing here on proposing some vehicle of 
transition from contradictory requirements encountered in the course of design 
process to the sought for hidden assumptions causing the contradiction. 

To this end, let us take a closer look at the above example with airplane. The 
contradictory requirements that the designers faced in that instance, were as fol- 
lows: the length of the engine’s propeller should not exceed some limit (set by the 
the reliability of the undercarriage legs ) and should exceed it ( in order to achive 
the objectives Oi, O2, O3, 04^. In the solution, these contradictory requirements 
(namely, the length should not exceed some limit and the length should exceed 
it) turned out to be separated as follows: the length of a single engine’s propeller 
did not exceed the limit but the length of all the engine’s propellers did exceed 
the limit. Such a separation of the contradictory requirements automatically im- 
plied that an engine should have had more than one propeller, which was the 
desired revision of a tacit unrecognizable assumption that an engine had one 
propeller only. 

It is easily seen that the contradiction itself contained no hint on what hid- 
den assumption to revise, whereas the separation clause clearly pointed out to 
such an assumption. On the other hand, the separation clause could be obtained 
from the contradiction clause by inserting the opposite attributive words sin- 
gle and all into the opposite portions of the contradiction clause followed by 
the corresponding grammatical adjustment of other words. Indeed, the above 
contradictory requirements to the length of the engine’s propeller imply the fol- 
lowing contradiction clause: the length of the engine’s propeller does not exceed 
the limit and the length of the engine’s propeller does exceed the limit. By in- 
serting the opposite words single and all in the opposite portions of this clause 
and by entailing grammatical changes in other words, we obtain the separation 
clause: the length of a single engine ’s propeller does not exceed the limit and the 
length of all the engine’s propellers exceeds the limit. The insofar subconsious 
assumption on the number of the engine’s propellers (which should be revised 
in order to solve the above aircraft problem) is now logically derivable from this 
clause. 

Thus, we see that although it may be very difficult to recognize the hidden 
assumption, which causes contradiction, from contradiction itself, it is very easy 
to recognize it from a separation clause, which can be obtained from the former 
one in a formal way. Our next objective is to describe this formalism. 

Let S be the design belief system that includes the objectives and assump- 
tions (both explicit and implicit). Suppose that the design process runs into a 

contradiction: S h R{A) and S I where R{A) means a requirement R 

to an object/subsystem A of the system to be designed. Let w and ic* be some 
opposite/dual attributive words about object/system A, and let w • A and ic* • A 
means that the word w and the word w* is applied to A respectively. Then 

S — [R(w • A)&^^R{w* • A)] are those assumptions, which should be retracted 
from 5, where — means the contraction function [2] . 



Inventiveness as Belief Revision 



331 



I would like to demonstrate this rule on a few more examples. 

3 Examples 

3.1 Prom the History of Triode 

When Marconi succeded in sending Morse code messages consisting of “dots” 
and “dashes” by radio waves, many inventors started to search for ways of broad- 
casting speech and music. The main problem here was that the signals generated 
in receivers by the radio waves were so weak that even “dots” and “dashes” were 
bearly distinguishable not to mention distinguishing the sounds of speech and 
music. Thus, the attempts to design a music/speech receiver had run into the 
contradiction: the electrical signals in the receiver should be strong enough (in 
order to distinguish the complex sounds of speech and music) but they are too 
weak [3]. 

A possible separation sentence for this contradiction is as follows: the control- 
ling signals are weak but the controlled signals are strong. Another separation 
sentence could be like this: the modulating / shaping signals are weak but the 
modulated/ shaped signals are strong. Both sentences indicate that the assump- 
tion that the detected signals are directly fed into the speakers should be revised. 
This realization drives the thought in the direction of the idea of triode: weak 
detected signals should control and thereby shape strong signals which are fed 
into speakers. As is known, Lee De Forest achived it by placing the third elec- 
trode between two electrodes of the electrone tube (diode) and by applying the 
weak signals to the third electrode and feeding the strong signals from the other 
two electrodes into the speaker. He patented the triode in 1907 and soon had 
been able to broadcast a live Metropolian Opera performance of Enrico Caruso. 

3.2 Prom the History of Peedback Curcuit 

Amplification of signals by triodes was, however, not safficient for high quality 
broadcasting. That is why Lee De Forest started to build the cascades of triodes 
by feeding the output from the plate of one tube to the grid of the second, and 
the output of the second to the grid of the third, and so forth. This, however, 
led to the enormously big receivers. 

Thus, the attempts to build a small high quality receiver run into the con- 
tradiction: there should be a cascade of triodes in order to get the high qulity 
receiving and there should not be a cascade in order to have a small receiver. 
The separation clause here looks as follows: there is no a cascade in space but 
there is a cascade in time. In other words, the output of a single triode should 
be iteratively fed into its own grid. That was the idea of the feedback circuit 
patented by Edwin Howard Armstrong in 1912 [4]. 

3.3 Prom the History of Absorption Refrigerator 

At the beginning of the XX century the problem of creating a refrigerator with- 
out compressor and other moving parts attracted attention from many famous 



332 



Y.B. Karasik 



inventors and scientists including even Einstein [5] . The main difficulty here was 
that in order to get rid of compressor, the pressure in evaporator had to be higher 
than in condenser. However, in order for a refrigerator to cool, the pressure in 
evaporator has to be lower than in condenser. 

Thus, we have a contradiction, a possible separation clause of which looks 
as follows: the total pressure in evaporator is higher than in condenser but the 
partial pressure of the cooling agent in evaporator is lower than in condenser. 

Just this idea was implemented by the Swedish inventor Carl Munters in 1922 
when he built the first refrigerator without mechanically moving parts [5] . 

4 Conclusion 

In this paper, I presented a formal rule for generating a sentence, from which 
the sought for revision of the design belief system is almost obvious. The rule is 
empirical and cannot be proven mathematically (as is the case with the laws of 
nature but I stop shortly from calling this rule a law). The rule can be demon- 
strated on numerous examples from the history of technology that does not 
imply that the past inventions, which it explains, were done by making use of 
this rule. One can arrive at a correct answer not necesserily in a logical way but 
by chance that does not mean that there is no logic to help to find a correct 
answer. Ultimately, people found correct answers to many questions long before 
Aristotle put forward any rules of logic (which he also discovered empirically) but 
since then, the procedures of drawing conclusions were significantly simplified 
and made less error prone (for those who studied and adhered to his logic). 

Analogously, the procedure of navigating amongst contradictions (which is 
the essense of engineering design) can be grossly formalized and simplified by 
making use of the proposed rule. The only element of uncertainty (or freedom 
if one likes) which is present in the rule is how to find the proper opposite/dual 
words to insert into the proposition and the contraposition respectively. This 
unsertainty can be further decreased by analysis of plausible separation clauses 
of the past inventions and compiling a thesaurus of such words. Coupled with 
such a thesaurus, the above rule can be turned into a powerful tool of inventive 
reasoning in artificial intelligence. It has been successfully tested in the courses on 
inventive creativity conducted by the author and employed by many graduates 
of the courses in their daily practices. 

References 

1. G. S. Altshuller: Creativity as an Exact Science. Gordon and Breach, New York 
(1984). 329 

2. C. E. Alchourron, P. Gardenfors, D. Makinson: On the logic of theory change: 
partial meet contradictionand revision functions. The Journal of Symbolic Logic, 
50(1985) 510-530. 329, 330 

3. I. E. Levine: Electronics Pioneer: Lee De Eorest. Messner, New York (1964). 331 

4. L. P. Lessing: Man of High Eidelity: Edwin Howard Armstrong: a biography. Lip- 
pincott, Philadelphia (1956). 331 

5. V. Ya. Erenkel, B. E. Yavelov: Einstein the Inventor. Nauka, Moscow (1981). 332 



A Decision Support Tool 
for the Conceptual Design of De-oiling Systems 



Badria Al-Shihi', Paul W. H. Chung^ and Richard G. Holdich' 

’ Department of Chemical Engineering, Loughborough University, Loughborough, U.K 
^ Department of Computer Science, Loughborough University, Loughborough, UK 
p . w . h . chung@lboro .ac.uk 



Abstract. De-oiling of petroleum wastewater is a major eoneem in petroleum 
proeess engineering. Deeision support systems (DSS) have been used in 
assisting operators in evaluating different disposal options of produetion water, 
but not the de-oiling proeess. Also, no applieation has been reported in assisting 
the de-oiling of other petroleum waters sueh as proeess, ballast or drainage 
water. This paper deseribes a DSS for the COneeptual DEsign of de-oiling 
Systems (CODES) for handling different types of waste water by supporting the 
tasks of: 

- assessing the types and magnitudes of waste-water streams 

- exploring the feasibility of mixing different streams. 

- seleeting the types of de-oiling equipment at different stream loeations 

- eonsiders the need for multi-stage treatment to meet quality requirements 
stated in standards and regulations. 

CODES is implemented in Mierosoft Exeel and is aeeessed via a web-based 
front-end. 



1 Introduction 

The use of decision support systems (DSS) in industry is proving to be beneficial. The 
petroleum industry was one of the earliest industries that used DSSs to assist its 
operations. DSSs were mainly applied in aiding exploration^'"^, drilling operations'"^ 
and controlling refining operations DSSs were also applied in well control and 
stimulation^^'^"^ and process control^^'^^ to assist production operations. Due to strict 
environmental regulations, a major concern in the petroleum industry is the 
processing and disposal of petroleum waters. Petroleum waters can be classified as 
production, process, ballast, drainage, cooling, sewage and drilling water. The first 
four have similar characteristics and have similar treatment methods. The latter three 
require special treatments and are not considered in this paper. 

Guidelines for de-oiling equipment design are voluminous and follow a sequential 
approach. Also, no specific computer tools have been developed to support the 
conceptual design of de-oiling equipment. The lack of computational support results 
in lengthy design time and error prone calculations. This project began by examining 
the tasks that need to be carried out during the conceptual design stage as detailed in 
design guidelines. An activity model that represents the design process was then 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 334-344, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




A Decision Support Tool for the Conceptual Design of De-oiling Systems 335 



created. The activity model expressed in UML shows explicitly the ordering of tasks. 
Some tasks need to be carried out in sequences, while others can be carried out in 
arbitrary orders. The activity model provided the basis for the design of a DSS for the 
conceptual DEsign of de-oiling Systems. Knowledge required for the different 
design tasks was gathered from design guidelines, books and an expert in the field. 
CODES was then implemented in Microsoft Excel. Spreadsheets were used for input 
and output. Inference and calculation were implemented in Visual Basic. CODES also 
has a web-based front-end written in HTML and Java to provide multi-user access to 
the DSS and the related descriptive information. 

The conventional design procedure for de-oiling system design is briefly considered 
in section 2 to provide the necessary background. Section 3 describes the activity 
model that has been developed. Section 4 gives an overview of the CODES. The 
paper ends with a summary and a few observations based on a case study that was 
carried out using the tool. 



2 Conventional Design Procedure 

The conventional design procedure consists of nine stages: 

1. Identifying the wastewater source and magnitude - Petroleum water is classified 
as one of the following types: production water, process water, ballast water, 
drainage water, drilling water, sewage water and cooling water. The latter three 
types have special and complicated treatment methods that are different from the 
others. 

2. Identifying the concentration and nature of contaminants in water streams - In 
most cases, effort is made to design treatment systems for the separation of 
dispersed hydrocarbons from the oily water, as this is the concern of most 
standards and regulations. 

3. Identifying the quality requirements for disposal options -surface or sub-surface 
disposal - The discharge limits varies from country to country. Typically 
offshore discharge should not exceed the limit of 20-40 mg of oil in 1 litre of 
water content. The coastal and inland limits are even less and are in the order of 
5-10 mg oil in 1 litre of water. 

4. Selecting the most suitable location for de-oiling equipment - The best location 
for many of the available de-oiling equipment is where conditions of low shear 
and coalescence occur. 

5. Identifying the process constraints and ways to ease them - Constraints of water 
de-oiling include shear introducing devices or fittings. It is recommended to 
substitute these devices with low shear ones or to place them downstream of the 
de-oiling equipment. 

6. Selecting the desired number of water treatment stages for the required water 
quality - It is preferred to have more then one stage of treatment for wastewater. 
This is to decrease the load on a single unit and also to increase the quality of the 
effluent treated water. 




336 



Badria Al-Shihi et al. 



7. Selecting the suitable water treatment equipment by considering the droplet size 
distributions and other constraints 

8. Identifying the methods for treating and disposing any secondary streams - This 
will include either recycling, mixing or separate treatment trains depending on 
the stream contaminant nature. 

9. Revision of design and process integration 



3 Activity Model 

The procedure described in the previous section represents the conventional 
engineering practice to reach a decision about the best de-oiling equipment type and 
location. The procedure lacks parallelism and some of its activities lack guidance and 
information. Inexperienced engineers may find the application of the procedure 
difficult. The design procedure was carefully examined in order to develop an activity 
model for the design process. UML activity diagram^^ was used to represent the 
model. Table 1 describes the common symbols that are used. 

The top level of the design process is shown in figure 1. Where appropriate, by 
leaving some of the activities un-ordered the number of stages is reduced to four from 
nine in the conventional procedure. The advantage of the new four-stage procedure is 
that process integration, i.e. options for mixing different streams, is considered earlier 
on in the design process. Activities are decomposed into further sub-activities where 
necessary. For example figure 2 shows a part of the decomposition of the first activity 
in figure 1. Figures 3 and 4 show the decompositions of two of two other activities. 



Table 1. Activity diagram symbols 



Symbol 



□ 



► 



Description 

The start of the proeess 

An aetivity to be earried out 

A deeision aetivity with a yes or no answer 

A synehronisation bar whieh indieates all aetivities preeeded the bar have to 
be eompleted before moving on to the next task 

Direetion of aetivity flow 



o 



The end of the proeess 






A Decision Support Tool for the Conceptual Design of De-oiling Systems 337 



Stage 1 




Fig. 1. Top level of the activity model for the conceptual design of de-oiling system 




Fig. 2. Part of the decomposition of the activity determine source, magnitude and 
contamination of water 


















338 Badria Al-Shihi et al. 



Investigate dispersed hydroearbon eoneentrations 



Investigate toxie metals 



W Investigate eorrosion inhibitors 



W Investigate demulsifiers 



Investigate suspended solids 



Investigate salinity (TDS) 



Investigate pH 



^ Investigate hardness 



■M Investigate dissolved O? 



Fig. 3. Decomposition of the activity investigate chemical composition of each stream 




Fig. 4. Decomposition of the activity process integration 













A Decision Support Tool for the Conceptual Design of De-oiling Systems 339 



4 Overview of Computer Support Tool 

Instead of using a knowledge-based system development tool, such as CLIPS the 
main component of CODES is written in Microsoft Excel. The reason is that Excel is 
commonly available and easy to use, and engineers are very familiar with it. The four 
conceptual design stages are implemented as four spreadsheets, with one spreadsheet 
corresponding to one design stage. Input from, and output to, the user is done via the 
spreadsheet cells. Visual BASIC functions are written and linked to specific cells so 
that when data in those cells are available the appropriate calculation or inference is 
automatically carried out and the output is written to other cells that may trigger 
further computation. Cells are also linked between different spreadsheets, so that 
input or inferred data from one stage is made available to another stage. Each 
spreadsheet has the same three-part structure: 

1 . input data required for that particular design stage, 

2. results of calculation based on the input data, and 

3. comments on results based on expert rules. 

4.1 Front End 

In order to allow multi-user access in a convenient way, CODES also has a web-based 
front end. Figure 5 shows the initial web page. The four design stages are listed on 
this page and they are linked to the appropriate Excel spreadsheets. When a user 
selects a particular design stage a Java applet is executed to start Excel with the 
appropriate spreadsheet opened. The execution of the Java applet is important. It not 
only retrieves the spreadsheet from the server, it also updates the server with the 
modified version of the spreadsheet when it is closed. Without executing the applet, a 
local copy of the spreadsheet is created by the client and any changes are lost because 
the changed version of the spreadsheet is not passed back to the server. 

The web page also provides links to other useful information such as descriptive 
information about the different design stages, related regulations and data sources. 



4.2 Stage One 

As mentioned earlier, the spreadsheet for each design stage has a three-part structure: 
input data, results of calculation and comments on results. 

Input. From field data and inspection reports, the water sources and their magnitudes 
are estimated. The chemical analysis of the process streams provides the basis for 
contamination detection. The scale due to existence of O 2 , Barium, Calcium and 
Sulphate is investigated. The pH level in each stream is required to determine its 
effect on de-oiling. Many other investigations are also done using the data, such as 
emulsion stabilisation due to existence of corrosion chemicals. 




340 Badria Al-Shihi et al. 



The input data required from the user for each stream include: 

• Dispersed hydrocarbon in mg/1 

• Solids in mg/1 

• Treatment chemicals such as corrosion inhibitors and demulsifiers 

• Heavy toxic metals such as Cd, Cr, Cu, Pb, Hg, Mo, Ni, Vd, Zn 

• Scale forming ions such as Ba and S 04 ^' and Ca^"" and Cl' 

• Dissolved O 2 

• Total hardness 

• Total salinity 

• pH value 

• Oil density in kg/m^ and viscosity in Pa s 

• Water density in kg/m^ and viscosity in Pa s 

• Interfacial surface tension 

• Inlet pipeline internal diameter and construction material 




Fig. 5. The front end web page for CODES 







A Decision Support Tool for the Conceptual Design of De-oiling Systems 341 



Calculations. The input data is used to assess the contamination limits of the streams 
and the applicability of different disposal methods, and the suitability mixing different 
streams. Calculations of the Reynolds numbers and the corresponding flow regimes 
for the different streams are done in this stage. The Reynolds numbers are also used to 
estimate the inlets mixing intensities. Mixing intensity is a measure of coalescence 
process in the pipelines. Rough estimation of the maximum droplet size of the 
dispersed oil in each water stream is calculated using the Hinze equation. The Hinze 
equation is normally used for unstable emulsions and in the absence of free gas. The 
application of the equation here to calculate droplet sizes will not give accurate results 
because free gas is often present in process streams. However, the calculation is 
sufficient for the purposes of providing a rough indication and comparison with field 
data where they are available. 

Comments and Recommendations. If the mixing intensity is within proper de-oiling 
coalescence limits then the pipeline is considered to promote coalescence, hence 
aiding the de-oiling process. On the other hand, if the pipeline does not promote 
coalescence than other coalescence devices may be recommended. The user is also 
warned of any scale or contamination problem in any of the process streams. 

Following the above investigations, the process integration step can be done by 
considering the suitability of mixing different streams. The mixing test investigates 
toxicity of the stream, hardness, salinity and scale formation. Suggestions to 
overcome any process complications due to compositions and inlet conditions are 
given to the user. 



4.3 Stage Two 

Input. In this stage the user is required to input the process description. This includes 
existing equipment types, total flows, temperatures, pressures, solid concentrations, 
oil concentrations and oil particle size. The following information that relates to 
existing equipment needs to be specified where applicable: 

• Type of production separator 

• Type of filter if any 

• Type of gas flotation 

• Hydrocyclone design maker 

• Hydrocyclone inlet and body diameter 

• Type of coalescing devices 

• Inlet diameter to coalescer 

• Spacing of plates in coalescer 

• Inclination of plates 

Calculations. The sheet calculates the Reynolds numbers at different locations of the 
plant, the corresponding friction factors and the mixing intensity for every pipeline 
described. 




342 Badria Al-Shihi et al. 



Comments and Recommendations. The system considers the existing equipment 
specifications and the process conditions and makes any suggestions that may 
enhance the de-oiling process. Expert rules are applied to comment on the space and 
maintenance requirements of the field equipment, particularly for gravity settlers, to 
ensure compatibility with other choices. Performance checks are carried out for 
hydrocyclones. Pipelines are also checked to determine whether they can act as 
coalescing devices. The system will recommend the use of coalescence devices within 
primary separation and use of continuous operations instead of batch operation where 
applicable. The system will check and recommend using control valves down stream 
of any de-oiling equipment, instead of upstream, to reduce turbulence. The use of low 
shear pumps is also recommended. 

In addition to critiquing the arrangement of existing equipment, the system will also 
generate suggestions as to what type of equipment is to be used at different location. 
The user may alter the process conditions to explore the suitability of a wider choice 
of de-oiling equipment. Sometimes this type of investigation will lead to better 
process economics. For instance if the temperature was found to be too high for a 
certain type de-oiling equipment the best solution may be to install a cooler rather 
than changing the de-oiling equipment. 



4.4 Stage Three 

Input. This stage is concerned with the disposal options of the wastewater streams. If 
the water is to be disposed into the environment, the environmental emission 
regulations must be met and environmental regulations vary from one country to 
another. The user has to provide the following data to complete this stage: 

• type of field location, i.e. offshore, inland or coastal 

• country of operation 

• stream bacteria (yes or no) 

• disposal formation analysis such as : 

• scale formation of ion compositions 

• tolerance of formation (high or low) 

• swelling of formation clays (yes or no) 

• movement of formation (yes or no) 

• stimulation possibility (yes or no) 

• end uses of formation, i.e. domestic or industrial 

• flow and composition data related to other injection water (if any) 

Comments and Recommendations. Disposal methods for the different streams are 
recommended. Some comments are made to relating to the use of other injection 
waters (if any). The comments are based on eliminating any complications that might 
arise from unsuitable components in the injection water streams. This is because sea, 
river or lake-water can introduce scale, bacteria or solids into the injection formations. 




A Decision Support Tool for the Conceptual Design of De-oiling Systems 343 



4.5 Stage Four 

Input. The equipment options generated in stage two are investigated further here to 
provide ratings for comparison. 

Calculations. A rating for each different equipment option is calculated based on 
cost, space, weight, complexity and maintenance requirement. 

Comments and Recommendations. A final recommendation for the types of de- 
oiling equipment that should be used at different locations is generated. The design 
specifications for the equipment, control configuration and secondary stream options 
are also given. Recommendation for further treatment is indicated if the suggested 
equipment does not achieve the quality requirement. 



5 Summary and Conclusions 

DSSs are commonly used in petroleum exploration, production and refinery 
operations. This paper described a novel application of DSS techniques to support the 
conceptual design of de-oiling systems. The project began by modelling the design 
activities and UML activity models were produced. The modelling process helped to 
clarify the activities that need to be carried out and the order in which they can be 
carried out. Greater flexibility is introduced by allowing appropriate activities to 
happen in parallel. Knowledge elicitation was carried out to formalise the required 
knowledge from experts, books, design guides, etc. CODES was implemented and 
tested using a case study. It was found to be user friendly and speeded up the time 
required to carry out the conceptual design. Further development of CODES is 
required to make it more robust and incorporate more knowledge. 



References 

1. Dharaphop, Jirapong: Expert System for Disposal of Produced Water from 
Petroleum Exploration and Production in New Mexico, MSc. Thesis, New 
Mexico Institute of Mining and Technology, New Mexico (1993) 

2. Eckles, Wesley W. Jr.: Expert System for Quantitative Log Analysis, Petroleum 
Engineer International, Vol. 63 (June) 72 

3. Miller, Betty A.: Object Oriented Expert Systems and their Applications to 
Sedimentary Basin Analysis, U.S. G.P.O., Denver (1993) 

4. Soto Becerra, Rodolfo.: An Expert System to Select the Appropriate Fracture 
Treatment Design Model, MSc. Thesis Texas A & M University, Texas (1992) 

5. Affleck, Noel; Zamora, Mario: PC- Based Expert System Aids Optimum Mud 
Selection, Petroleum Engineer International, Vol. 59 (January) 38 

6. Courteille, J. M.; Fabre, M.; Hollander, C. R.: An Advanced Solution: The 
Drilling Advisor, Journal of Petroleum Technology, Vol. 38 (August) 899-904 

7. Eckles, Wesley W. Jr.: Expert System for Casing and Tubing Strings, Petroleum 
Engineer International, Vol. 63 (August) 55-58 




344 Badria Al-Shihi et al. 



8. Onan, D. D.; Kulakofsky, D.; Van Domelen, M. S.: Expert Systems Help Design 
Cementing and Acidising Jobs, Oil & Gas Journal, Vol. 91 (April) 59-61 

9. Kulakofsky, David; Crook, Ronald J.: Knowledge Based Expert System Ease 
Cement Slurry Design, Offshore, Vol. 52 (June), Oklahoma 43 

10. Cadmus, Richard H.; Woosley, Melvin D.: Expert Systems Complement 
Refinery Information Systems, Oil & Gas Journal, Vol. 87 (January) 50-54 

11. Takahashi, Kimikazu.; Kateeshock, Tom.: Expert System for Refinery Off-Site 
Facility Management, ISA Transactions, Vol. 31, No.2, 67-75 

12. Damaron, E. Bruce.; Schulze, Randall T.; Bochsler, Daniel C.: Well Control 
Becomes Target for Expert Systems, Oil and Gas Journal, Vol. 87 (February) 35- 
40 

13. Xiong, Hongjie: STIMEX- An Expert System Approach to Well Stimulation 
Design, PhD. Thesis Texas A & M University, Texas (1992^ 

14. Khan, Sameer Ali: An Expert System to Aid in Compositional Simulation of 
Miscible Gas Flooding, PhD. Thesis University of Texas at Austin (1992) 

15. Ayral, T. E.: On Line Expert System for Process Control, Hydrocarbon 
Processing, Vol. 68 (June) 61-63 

16. Heywood, C. H.: Pipeline SCADA Systems: Yesterday, Today, Tomorrow, 
Pipeline Industry, Vol. 67 (August) 46 

17. Kobyakov, A. I.: Include Heuristics in Protection Systems, Hydrocarbon 
Processing, Vol. 72, No.2, 79 

18. Ramanathan, Prasad.; Kannan, Suresh.; Davis, James F.: Use Knowledge Based 
System Programming Toolkits to Improve Plant Troubleshooting, Chemical 
Engineering Progress, Vol. 89 (June) 75-84 

19. Spriggs, Kevin V.: The Uses of Rule Based Programming in a Unit Level 
Programmable Controller at Auburn University’s Waste Oil Reprocessing 
Facility, MSc. Thesis Auburn University (1986) 

20. Touchstone, Terrel.; Blackwell, Derek E.; Carter, Grady E.: Expert Systems 
Trains, Advises Process Operators, Oil and Gas Journal, Vol. 88 (February) 41- 
44 

21. Fowler, Martin; Scott, Kendall: UML Distilled: Applying the Standard Object 
Modelling Language, Addison- Wesley, USA (1997^ 

22. Giarratano, J.: CLIPS User’s Guide, NASA, Lyndon B. Johnson Center 
Information Systems Directorate, Software Technology Branch (1993) 




ProCon: Decision Support for Resource Management 
in a Global Production Network 



Florian Golm^ and Alexander V. Smirnov^ 

^ FFA Ford Research Center Aachen 
Suesterfeldstrasse 200, D-52072, Aachen, Germany 
f golm@f ord . com 

^ St.Petersburg Institute for Informatics and Automation 
of the Russian Academy of Sciences 
39, 14^^ Line, St.Petersburg, 199178, Russia 
smir@iias . spb . su 



Abstract. Applying modem transportation and communication means in the 
context of inter-enterprises global and local (region-oriented) business 
collaboration within the Virtual Enterprise is named Virtual Global Production 
Network (GPN). The new approach for GPN configuration called “Affordable 
Cost Stmcture approach” is described. This approach oriented to improve 
investment efficiency over total facility life-time. A kernel of this approach is a 
distributed multi-level constraint satisfaction technology based on a shared 
domain knowledge model “product - process - resources”. 



1. Introduction 

A virtual Global Production Network (GPN) can be defined as a flexible connection 
of appropriate production modules at different locations with the target to fulfill a 
concrete production task. The consortium exists for a predefined period of time. The 
network becomes real when a concrete realization takes place or at least the necessary 
budget is endorsed. During the planning phase the GPN represents a planning subject 
in order to design and evaluate potential scenario solutions for the production task. 
Figure 1 explains the basic concept of the GPN, a concept that Ford Motor Company 
intends to realize in the area of manual transmission with the target to coordinate and 
harmonize global production activities in a virtual distributed consortium of 
plants [3]. 

The virtual GPN has its origin in the concept of virtual enterprises [1,2, 4, 5, 6]. The 
approach is simular but it refers to the coordination of production facilities in one 
company. Nevertheless, the integration of external suppliers for production modules 
is principally possible. It is chiefly a horizontal structure, which means that the 
involved plants are having equal rights and responsibilities. Legal and formal 
circumstances are of minor relevance - these are rather the focus of virtual enterprise 
constellations. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 345-350, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




346 Florian Golm and Alexander V. Smirnov 




Fig. 1. Configuration of global produetion networks 

Current situation in Ford global production planning is following: (1) partially free 
capacities in product dedicated plants; (2) insufficient shared funding for cross 
program projects; (3) cycle plan instability prevents effective long term planning; (4) 
information deficits in central planning area; and (5) insufficient holistic approach to 
optimize the entire production network. 

In the area of manual transmissions, Ford owns four production plants, distributed 
in Europe and Brazil. The production network consists of the different plants 
respectively their production modules, which represent the general production 
potentials in a structured and more detailed way. Production modules are capable of 
producing parts or sub-components representing sub-sequences of the entire 
production process. In the new concept a central planning staff is responsible for the 
coordination of production engineering and investment activities in order to fulfill 
new requirements for new manufacturing tasks. Reasons for the necessity to change 
the production system can be seen in shifting production volumes of existing products 
or in the innovation of technologies or products in order to react on changing market 
situations. 

This approach was realized in Ford’s project “Configuration and Optimization of 
Global Production Networks in Order to Improve Investment Efficiency over Total 
Facility Life-Time”. General objective of the project was - “Development of a 
methodology and a software tool for selection and reuse of facilities; investment 
calculation on life cycle level and configuration of global production network, taking 
into account quantitative and qualitative aspects in cross program planning”. 

The globalization of industry has increased the need for industry standardization as a 
methodology to avoid duplication and misinterpretation. Today corporate knowledge 
of large industrial companies is distributed among many DBs. The need is increasing 



ProCon: Decision Support for Resource Management in a Global Production Network 347 



to manage industrial knowledge in order to convert external market forces (such as 
speed of change, cycle-time reduction, globalization, etc.) and certain internal 
changes. In result new information technologies like constraint satisfaction, 
knowledge management and other are attracting increasing interest from industrial 
companies [8]. The goal of the project was to develop a methodology and tools for 
automated re-use of industrial experience from large collection of data and knowledge 
in the engineering and business domains. 

Knowledge-based decision support system “ProCon” (Production network 
Configuration tool set) was developed as one of the project results. "ProCon" focuses 
on the early stages of planning procedure of investment calculation and determination 
for the (a) derivation of production scenarios, (b) determination of investment cost, 
(c) assignment of locations and (d) estimation of product variable cost. 

The paper discusses a generic methodology and a knowledge-based environment 
(called ProCon) architecture for GPN configuration as a resource management 
technology. 



2. Global Production Network Configuration as a Resource 
Management Technology 

GPN reconfigurability is becoming more and more important. Configuration as one of 
business processes reengineering technologies helps to keep legacy applications in 
business by transforming their current architecture to a new, more maintainable one. 
This approach is the discipline of identifying the configuration of a system at discrete 
points in time for the purpose of controlled changes to the configuration and 
maintaining the integrity of the configuration throughout the system life cycle. The 
main elements contents of an entity “Configuration” will include the following parts: 
a baseline, a set of changes and a sequence of changes. 

Configuration consists of two aspects - configuring/reconfiguring and 
configuration maintenance. Configuring deals with creating configuration solutions; it 
involves selecting components and the ways of their configuring. Reconfiguring 
basically performs the process of adapting old configurations to new situations. 
Configuration maintenance deals with maintaining a consistent configuration under 
change; this requires the consistency among the selected components and decisions. 
When a decision for selected components changes, configuration maintenance must 
trace all the decisions related to the changed decision and revise them, if necessary, to 
maintain consistency among the components and decisions. 

Since participants of GPN perform only a part of technological process, the main 
problem is a sharing the partial processes between plants. It is possible to select two 
basic stages in this problem solving: (i) partition of a technological process, and (ii) 
making the plants network in the GPN. The second stage includes three sub-stages: 
(1) identifying potential participants of GPN, (2) estimating strategic interests and 
risks by potential participants of GPN, and (3) optimisation of final GPN 
configuration. 

The process begins with composing the initial GPN configuration, in other words 
the appropriate production modules units must be identified considering the potential 




348 Florian Golm and Alexander V. Smirnov 



manufacturing processes. The final configuration of a GPN may be developed using 
an iterative procedure, starting with an initial one and a progressive sequence of 
changes, which determine its intermediate configuration. After each iteration 
negotiations may be necessary to ascertain whether the configuration is acceptable to 
every unit. If it is not so, the next iteration will be undertaken. It is desirable to 
automate these negotiations as far as possible between the plant manager and the GPN 
manager who represents GPN as a whole. In this context, it could be most efficient to 
apply engineering and management methods to form a system’s “product-process- 
resource” model satisfying the constraints on manufacturing resources, such as 
investment costs, production layout, capacity, and lead time [3], [6], [7]. 



3. ProCon Architecture 

The environment “ProCon” aims at establishing a knowledge platform enabling 
manufacturing enterprises to achieve reduced lead time and reduced cost based on 
customer requirements through customer satisfaction by means of improved 
availability, communication and quality of product information. ProCon follows a 
decentralized method for intelligent knowledge and solutions access. Configuring 
process incorporates the following features: order-free selection, limits of resources, 
optimization (minimization or maximization), default values, freedom to make 
changes in GPN model. 

ProCon distinguish between two planning levels: the central planning area and the 
decentralized planning area of the distributed plants. Every production program 
project or planning activity is initiated by request asking for manufacturing of a 
product in a predefined volume and time frame. Starting with that, the central 
planning staff has the task to define a production system, capable to match these 
requirements. The staff engineers prepare the request for the plants. They harmonize 
and aggregate parallel incoming requests over different planning periods. In reaction, 
the plants offer their production modules as a contribution to the entire network 
system. On plant level engineers have to analyze the manufacturing potentials 
concerning capacity and process capability of their facilities. The expertise for 
developing and engineering of the production modules is explicitly available only in 
the plants. 

The planning engineer in the central planning staff acts like a network broker. He 
has taken over the task to configure the production networks on basis of the offered 
production modules. In this configuration process, he has to take into account 
additional restrictions like global (logistical, strategical, and political) constraints. 
ProCon consists of three software tools (IT-modules). 

Module 1: This IT-module is supposed to be used in the central planning staff. The 
general function is to coordinate and harmonize all incoming requests. This IT- 
Module supports the general preparation of the requests for the different plants. In 
addition the tool also assists in definition of standards as well as the propagation of 
best practices, the precondition for the implementation of a reasonable degree of 
commonality between the plants in order to enable synergy effects in the network. 




ProCon: Decision Support for Resource Management in a Global Production Network 349 



Module 2: The generation of the production modules will be supported by IT- 
Module 2. This tool will be applied by planning engineers in the plants. On the basis 
of existing production facilities and in case of capacity lacks new facilities, the 
production modules are configured. Technical adaptations will be done in order to 
meet the manufacturing requirements given by the requests respectively the to be 
produced product. Taking all necessary investments into account costs of product and 
processes can be calculated over total project or facility lifetime. This information are 
substantial for the offer, the plants hand over again to the central planning staff. 

Module 3: On the basis of the production modules the configuration of the entire 
production network takes place in IT-Module 3. Additionally, strategic constraints 
can be taken into account in order to optimize entire network. Criterion like local 
content or preferences for specific plants due to time or quality restrictions as well as 
logistical constraints are relevant for the decision making process on this level. 

In order to design a GPN that can be reconfigured to meet the changing production 
demand, one has to understand the relationships in the conceptual model of the 
system ”GPN product - GPN process - GPN resource" satisfying the global and local 
constraints. The GPN model divides the resource management system into five levels: 
location (plant), module, line, machine, and resource (cost center). Example of 
hierarchy “product - subassembly - part - techological process” is shown in the 
figure 2. The GPN model served as a knowledge repository, programmed using 
Visual C++, and oriented on working with relational DBs using ODBC interface. 




Fig. 2. ProCon: Hierarchy Dialog 



4. Conclusions 



The knowledge-based “Affordable Cost Structure“ approach targets an efficient 
planning process on basis of global production networks according to (a) cross 
program planning, (b) reuse of resources, and (c) more secure planning data. ProCon 
estimation (proved by a case study) showed high economic and qualitative potential 
benefits for application of the approach. 

In the future the project could develop in the following directions: 





350 Florian Golm and Alexander V. Smirnov 



1. Methodical verification and adaptation: analyze and modify as-is planning 
process, identify additional methodical needs, analyze and implement necessary data, 
educate people concerned (planning staff, plant engineers etc.). 

2. Software Engineering: improve data structure and algorithms, adapt to Ford 
standards, implement links to existing software tools, improve ergonomics (GUI) and 
usability, coordination with running Ford projects, network capability, training. 

3. Organization framework: implement to-be planning process; competence 
structure, people, processes; balance this approach and organization; production 
module definition, adapt cost structure; data / information acquisition, preparation, 
maintenance. 



References 

1. Eversheim, W.; Franke, R.; Kalkert, W.; Schuh, G. u.a.: Kooperative 

Wertschopfung - Produkt, ProzeB, Ressourcen Wettbeweerbsfaktor 
Produktionstechnik. AWK '96 Proceedings. VDI-Verlag GmbH Diisseldorf 
(1996) 

2. Hirsch, B.: Information System Concept for the Management of Distributed 

Production. Computers in Industry, Vol. 26. Elsevier Science B.V (1995) 229 - 
241 

3. Golm, F.:Plannung globaler Produktionsnetzwerke. Proceedings of Deutscher 

Logistik Congress, Berlin, Oktober (1999) 

4. Gulledge, T.R.; Sommer, R.A.: Aligning strategic objectives with organizational 

processes: a methodology for virtual enterprise implementation. Proceedings of 
the International Confemece of the Mansufacturing Value Chain . August '98, 
Troon, Scotland , Kluwer Academic Publishers (1998) 

5. Preiss, K.; Goldman S. L., Nagel R. N.: Cooperate to Compete - Building Agile 

Business Relationships, Van Nostrand Reinhold (1996) 

6. Smirnov A.V.: Virtual Enterprise Configuration Management. Proceedings of the 

14* IFAC World Congress (IFAC’99), Beijing, China, July, 1999. Vol. A. 
Pergamon Press London (1999) 337-342 

7. Smirnov, A.V., Sheremetov, L.B.: Configuration of Complex Systems Based on the 

Technology of Intelligent Agents. Automatic Control and Computer Sciences. 
Vol.32 (4). Allerton Press, Inc., New York.Y (1998) 15—24 

8. Young, R.E., Greef, A., 0'Grady,P.: An Artificial Intelligence Constraint Network 

System for Concurrent Engineering. Int. J. Production Engineering, Vol. 30, N 7 
(1992) 1715-1735. 




Intelligent Infrastructure 
That Support System’s Changes 



Jovan Cakic 

Computing Laboratory, University of Kent, Canterbury, Kent, CT2 7NF, UK 
j c4 8@ukc .ac.uk 



Abstract. This paper tries to explore the possible role for Artifieial Intelligenee 
(AI) in the infrastrueture of the Web applieation of tomorrow. An effort is made 
to eonform proposed solutions to the existing infrastrueture of the Windows 
NT - based Web applieation. Supported by AI serviees, Web applieation will 
beeome more flexible in its behavior and more open to arehiteetural ehanges. 



1 Introduction 

Artificial Intelligence for the infrastructure of the Web application will be discussed 

in context of: 

• Integration, as a prerequisite for structural changes. Since the early days software 
industry has been looking for the ways to achieve productivity and efficiency of 
software development. Today productivity and efficiency in software industry 
means modularity and integration or, in other words, software components. 
Component-based development of traditional AI concepts should provide small, 
fast and fully functional AI components. 

• Evolution, as a necessity. Evolution of the system is usually hard to achieve due to 
its monolith architecture and tightly coupling among its components. In order to 
enable and facilitate system’s need for change, design strategy usually rely on: 

• Multi-tier Client - Server architecture 

• Component-based development 



Server 


Interface 


Functionality 

_ cor\/ir'^ 1 




oC?l V 1 

_ cor\/ir^o 0 




- service 3 




Fig. 1. Client - Server arehiteeture 

Although these two concepts are key to flexible system design. Client are still very 
dependent on Server’s interface in two aspects: 

• Client need to know Server’s interface in order to use his services 

• Client is unable to benefit of the Server’s improvement - even if backward 
compatibility is maintained, new services cannot be used because Client is unaware 
of change (Figure 3) 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 351-356, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 





352 Jo van Cakic 



Server 


Interface 


Functionality (Improved) 


• 


- service 1 


• — 


- service 2 


• — 


- service 3 




Fig. 2. Improvements to the Server’s internal funetionality are transparent to the Client 

The previous describes a universal problem of the consumer - producer relationship, 
in which: 

• you cannot use something if you don’t know what it offers 

• you cannot use the best services if you are not familiar with the latest offer 



Client 



Server I 


Interface 


Functionality (expanded) 

- service 1 

- service 2 

- service 3 


1 


- new service 1 

- new service 2 




Fig. 3. Expansion of the Server’s interfaee is transparent to the Client 



The problem is mainly in the asymmetric nature of Client - Server relation, in which 
Server offers its services in some form and doesn’t need to know its Client. In order to 
use it. Client needs to understand the offer from Server. 



New Client 



Server 


Interface 


Functionality (expanded) 
-service 1 

- service 2 

- service 3 


• 


- new service 1 

- new service 2 



Fig. 4. Expansion of the Server’s interfaee is visible only to the New Client 



Client - Server interaction. In more symmetric connection Client doesn’t need to 
know exactly how to use services from the Server. Client task would be to describe 
the service he expects form the Server, and the Server will try to solve the problem 
using his current services in the optimal way. Similar approach has been used already 
in commercial software. In order to improve and optimize the use, some software 
packages bring in the automation, so - called Wizard, for several predefined use cases 
scenarios. The reason for this automation is a reasonable assumption that particular 
goals will be attained more easily if the application takes over the control from the 
user and engages itself in an optimal way. 





Intelligent Infrastmeture That Support System’s Changes 353 



Intelligent Client - Server interaction. Universal approach will be to put 
intelligence on the Server side - to have Server capable of intelligent interaction with 
the Client. In that kind of architecture, Client sees the Server interface as a formal 
way to describe a problem, rather than a set of methods he needs to understand. 



Client 



Intelligent 

Interface 



Server I 


Functional 

Interface 


Functionality 

- service 1 

- service 2 

- service 3 


• — 


- new service 1 

- new service 2 



Fig. 5. Intelligent interfaee handles the evolution of the Server 



New approach is based on the Multi-tier Client - Server architecture, in which the 
Middle tier is used to decouple Client - Server ties. The basic idea is to give much 
more importance to the Middle tier by making it intelligent in order to be: 

• capable of communication with the Client at the higher level of abstraction, hiding 
the functional aspect of the Server’s interface 

• responsible for management of Server’s functional components 

It is the task for the AI technology to refine and develop the Middle tier, so 
traditional Web application can offer to the user: 

• easy GUI personalization (Client side GUI created from components) 

• human like interaction (Server side intelligence) 

• responsiveness and dynamic reconfiguration (IIS infrastructure improvement) 



COM 









service 1 






Client 1 


Component! 

Manager 








CUM 

service 2 




COM 


— 






AI 

Logic Server 



|| System 
^Knowledge 



Fig. 6. Intelligent Client - Server arehiteeture 



General intelligent interface can be described as Component Manager, responsible 
for: 

• dynamic management (adding and removing) of Server’s functional components 

• exposing Server’s functionality to the Client at the higher level of abstraction 

A more detailed analysis of Component Manager will be given later in this paper, 
using Internet Information Server example. 






354 Jo van Cakic 



Web applications. In the typical Web application balance between Client and Server 
functionality is changed comparing to the traditional Client - Server application. Web 
application gives much more importance to the Middle tier because of its distributed 
nature. 



2 Intelligent IIS: New Infrastructure for the Middleware 

In the Multi-tier Client - Server architecture of a NT-based Web application IIS 
represents infrastructure of the Middle tier. As a Middle tier, IIS architecture and 
functionality are essential for system: 

• Changes. Since Middle tier is created in order to handle business logic and its 
evolution, IIS is responsible to support and facilitate system’s changes. 

• Scalability. The client never access Server directly, but through interaction with 
IIS. Therefore, performance of IIS and its components highly impacts performance 
and consequently scalability of entire Web application. 



2.1 Server Side Interface 

If we look at the architecture of a typical Web application (Figure 7) and compare it 
to the model on Figure 5, it is easy to identify the functional aspect of the Server’s 
interface as a set of ISAPI filters. ISAPI filter organization is hierarchical, based on 
multiple priority levels. Filters in hierarchical order intercept input data stream. 



Web 




TCP/IP 


Client 




Network 



Filter 2 
(Dll) 




ASP 
- (Dll) 




Filter 1 
(Dll) 





Internet Information Server 



WWW 

Service 




Active Server 
Component 






iSA2 

(Dll) 



Active Server 
Component 2 

Active server 
Component 3 



□ 


ISA 3 
(Dll) 




\ 

IJ [ 


Domain j 
)otabase [ 



Fig. 7. Web application Architecture 



ISAPI is the key mechanism for controlling Server side interface. Whenever Client 
makes an HTTP request of a Web Server, ISAPI filters gets a chance to intercept the 
request. For example, it is possible to create custom file extension and ISAPI filter to 
intercept all Client request for files with that extension. ISAPI filter will do some 
processing and return HTML code to the Client. Custom file format can include 
custom markup language that ISAPI filter understands. 








Intelligent Infrastmeture That Support System’s Changes 



355 



By implementing the model described previously on Figure 6, actual IIS 
infrastructure would be improved in the domain of: 

• Flexibility. Entire IIS filter configuration is loaded at Server startup and cannot be 
changed without restarting the Server. IIS is modular and extensible, but its filter 
modules are traditional DLLs. Benefits of COM in the domain of flexibility are 
very important. 

• Control. Each filter implements own logic and IIS common behavior is controlled 
only by filter hierarchical organization. It would be much better to have shared 
filtering logic in order to support IIS dynamic reconfiguration. 



2.2 The New Model for IIS Filters Mechanism 



The new model for IIS filter mechanism (Figure 8) is based on the more general 

model given on Figure 6. Key components of the new model are: 

• Filter Manager. In order to integrate new model into existing IIS infrastructure, 
Filter Manager will be implemented as a standard IS API filter (C++ DLL) with the 
highest priority level. Its primary tasks are preprocessing and dispatching the input 
data stream using the rules implemented in Filter Knowledge base. 

• AI Logic Server. AI Logic Server is implemented in form of a COM object and 
with functionality of standard Production System. Its responsibility will be to solve 
Filter Manager logic problems, typically in form: ”How to handle this task?” 

• Filter Knowledge. Filter Knowledge is a simple ASCII knowledge base file. The 
knowledge is responsible to describe how to handle specific Client tasks using the 
existing IIS filter set. 

• Legacy ISAPI filters. Standard IS API filters in DLL modules. 

• New ISAPI filters. New ISAPI filters will be implemented as a COM objects. 



New ISAPI Filters 



Filter 1 1 




(COM) 


Filter 2 
(COM) 


Filter 3 
(COM) 




Web 




TCP/IP 




Filter 


Client 




Network 




Manager (Dll) 




Legacy ISAPI Filters 



Filter 2 



(Dll) 


1 ASP 


1 ^ 


WWW 




(Dll) 


J ^ 


Service | 



Filter 1 
(Dll) 



AI Logic 
Server (COM) 




m Filter 
'^Knowledge 



Fig. 8. New Server - side interfaee model 





356 Jo van Cakic 



2.3 Benefits 

The benefits of new model for IIS filter mechanism are: 

• Integration with IIS. The new model for IIS filter mechanism is created with 
respect to backward compatibility and easy integration into existing IIS. 

• On demand loading of IIS filter components. Traditionally, entire IIS filter 
configuration is loaded at Server startup and cannot be changed without restarting 
the Server. With the new model loading of IIS filter is only 

• Easy changes to IIS behavior. With the new model, entire filtering logic is 
implemented in Filter Knowledge base. Dynamic reconfiguration, including 
registering the new IIS filter components, is easy as editing Filter Knowledge base. 

• COM technology for Filters. IS API filters in the form COM components will bring 
to the IIS design process all the benefits of component-based development and 
make Server - side modules more consistent. 

• General Fogic Services. General Fogic Services from AI Fogic Server are not 
restricted to support IIS filtering logic. As an easy-to-use plug-in component AI 
Fogic Server offers general services to all interested Server - side modules, in order 
to help them behave in truly intelligent way. 



3 Conclusions 

Technological infrastructure plays very important role in system’s aptitude for 
change, so it is expected to be flexible, robust and open. This is especially true for a 
Web application, because of its distributed nature. The potential support coming from 
AI could be very important for the Web application of the future, and upcoming 
development of AI will certainly open even greater possibilities. 



References 

1. Omahen, J., Cmahen, J.: Active Platform: A Developer's Guide: Microsoft 
Solutions for Next Generation Web Sites, IDG Books, 1998 

2. Hudson, K., Pastore, A. M.: IIS 4.0 Rapid Review Study Guide, 29th Street Pr, 
1999 

3. Racy, M., Tracy, M.: Professional Visual C++ Isapi Programming, Wrox, 1996 

4. Keyton, A. W., Petrusha, R.: ASP in a Nutshell, O'Reilly & Associates, 1999 




Using Description Logics for Case-Based Reasoning 
in Hybrid Diagnosis 



Yacine Zeghib, Francois De Beuvron, and Martina Kullmann 
LIIA, ENSAIS 

24, bid de la Victoire, 67084 Strasbourg, France 
zeghib@liia . u-strasbg . f r 



Abstract. We present how description logics can be used for modeling a case 
base for case-based reasoning. To illustrate this approach, we apply it to hybrid 
diagnosis. The case-based reasoning component of a hybrid diagnosis system 
exploits description logic inferences for classifying and querying the case base. 
As description logic interpreter we use the system CICLOP, whereas the diag- 
nosis system is implemented in G2. The description logic system runs as a 
server application and can thus be queried by the diagnosis system. 



1 Introduction 

The hybrid diagnosis system presented in [1] consists of two parts: a diagnosis com- 
ponent and a case-based reasoning (CBR) component. The diagnosis component 
interacts with the CBR component in order to achieve the diagnosis task. In this paper 
we present an application of description logics which is to model a case base, and to 
perform the corresponding case-based reasoning (CBR) steps. For the implementation 
of the CBR system we use the description logic system CICLOP (Customizable Infer- 
ence and Concept Language for Object Processing) [2]. It is a knowledge representa- 
tion system, developed by the Laboratoire dTnformatique et dTntelligence Artificielle 
(LIIA). The structure of the paper is as follows: In Sect. 2 we briefly describe the 
case-based reasoning cycle. In Sect. 3 we give an overview over description logics. 
Section 4 is devoted to the development of the case base model and its description 
logic implementation. The main tasks in the CBR cycle are presented in Sect. 5. In 
Sect. 6 the general behavior of the CBR system is developed. Section 7 presents an 
example to illustrate our method. We finally, draw some conclusions and discuss the 
remaining problems on which we will work in the future. 



2 The Case-Based Reasoning Cycle 

Case-based reasoning is a problem solving paradigm, in which a new problem is 
solved by exploiting similar previous cases [3]. The four processes of the CBR-Cycle, 
which are illustrated in Fig. 1, are briefly described in the following (see also [4], [5]): 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 357-366, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




358 Y acine Zeghib et al. 




5. Revise 



Output 

Fig. 1. The case-based reasoning cycle 

• Retrieve: The retrieve task starts with a (partial) problem description, and ends 
when a best matching previous case has been found. 

• Reuse: The reuse of the retrieved case solution in the context of the new case fo- 
cuses on two aspects: (a) the differences among the past and the current case and 
(b) which part of a retrieved case can be mapped to the new case. 

• Revise: Case revision consists of two tasks: (a) evaluate the case solution generated 
by reuse. If successful, learning from the success (case retainment), (b) otherwise, 
repair the case solution using domain-specific knowledge. 

• Retain: This is the process of incorporating into the existing knowledge what is 
useful to retain from the new problem-solving episode. 



3 Description Logics 

Description logics (DL) are a class of knowledge representation formalisms (see [6] 
for more details). They can be used to construct a knowledge base, containing knowl- 
edge about a specific application. In this paper we propose a DL model for the case 
base. The DL formalism consists of two parts. The terminological formalism enables 
defining the abstract conceptual frame through which the real world is to be seen. 
Using this formalism concepts can be introduced and relations between these concepts 
can be defined. By the assertional formalism concrete facts about the real world can 
be described, i.e. knowledge about particular objects, the so-called individuals. Indi- 
viduals are defined to be instances of concepts, and also relations between individuals 
can be introduced. So, a world description (or ABox) can be constructed based on a 
given terminology (or TBox). 

Besides, a description logic-based knowledge representation supports several infer- 
ence methods for reasoning about the represented knowledge. They automatically can 
make knowledge explicit which is only implicitly represented in a knowledge base. 
The basic reasoning facilities are the satisfiability and the consistency test. They allow 
for checking, whether a description logic knowledge base is contradiction-free. The 
subsumption inference computes the subsumption relation between two concepts, and 





Using Description Logics for Case-Based Reasoning in Hybrid Diagnosis 359 



can thus be used to organize the concepts of the knowledge base in taxonomical order, 
i.e. to classify the concepts. A concept C subsumes a concept D iff C is more general 
than or equal to D, i.e. the set of individuals denoted by D, the extensions of D, is 
included in or equal to the extensions of C. Besides, the realization inference calcu- 
lates the set of most specific concepts for a given individual. This is done with respect 
to the TBox containing the concept, which is instantiated by the individual. 



4 Organization and Implementation of the Case Base Model 

The hybrid diagnosis system proposed in [1], uses a CBR system containing a set of 
cases, i.e. failure examples, which represent the state of the system when the failure 
occurs. A case is described by the quintuple <ABN, NOR, DIAG, REPAIR, CONS>, 
where: 

• ABN = {(Variable-name, qualitative-value, tendency)}, the set of all abnormal 
variables which characterize the considered case. 

• NOR = {(Variable-name, qualitative-value, tendency)}, the set of the normal vari- 
ables which are used for the computation of the value of the elements of ABN. 

• DIAG = {(elements, failure-mode)} , the corresponding diagnosis of the case. 

• REPAIR = ''repair plan ", the corresponding repair plans of the case. 

• CONS = {(Variable-name, qualitative-value, tendency)}, the set of variables 
which are affected by the elements of ABN. 

The stored cases are represented as individuals in the ABox and indexed by abstract 
cases which are concepts of the TBox. The taxonomy of abstract cases is automati- 
cally computed using the corresponding DL inference. Cases are automatically linked 
to their corresponding abstract cases using the realization inference [7]. Note, that in 
the following implementation only the set of abnormal variables, which corresponds 
to the problem part, is used to represent a case. 

As mentioned above, we use the DL system CICLOP [2] for the implementation of 
the CBR system. In our model we use the description logic expressiveness ALCf 
(Attributive Language with Complements and Features) with strings which comprises 
the language constructors AND, OR, SOME, ALL, NOT and string concept descrip- 
tions. For more details see [2]. We define the following terminology: 

(def ine-string-concept qvalue (aset big medium small) ) 
(def ine-string-concept tendency 

(aset increasing stable decreasing) ) 

(def ine -primitive -at tribute has_qvalue) 

(def ine -primitive -at tribute has_tendency) 

(def ine-concept variable (AND 
(SOME has_qvalue qvalue) 

(SOME has_tendency tendency) ) ) 

(def ine -primiti ve- role has_variable) 

(def ine-concept abstract_case 
(SOME has variable variable) ) 




360 Yacine Zeghib et al. 



First, we define the qualitative value of the variables, which can be big, medium, or 
small, and the tendency of the variables, which is decreasing, increasing, or stable. 
Then for the definition of the variables, we use the two attributes has qvalue and 
has tendancy. Finally, the abstract case which is characterized by the set of the ab- 
normal variables, is defined using the role has variable. The defined terminology 
enables the definition of corresponding individuals. As an example, consider the fol- 
lowing case characterized by the ABN set 

ABN = { (VI, big, stable), (V3, medium, increasing)}. 

The DL representation in the TBox of its abstract case is illustrated in the following: 

(def ine-primitive-concept VI variable) 

(def ine-primitive-concept V3 variable) 

(def ine-concept abstract_case_l 

(AND (SOME has_variable VI) (SOME has_variable V3 ) ) ) 

The corresponding instance definition 
(instance ivl 

(AND VI (SOME has_qvalue qvalue (asset big) ) 

(SOME has_tendency (asset stable) ) ) ) 

(instance iv3 

(AND V3 (SOME has_qvalue qvalue (asset medium) ) 

(SOME has_tendency (asset increasing) ) ) ) 
(instance case_l abstract_case_l ) 

(related case_l ivl has_variable) 

(related case_l iv3 has_variable) 

Here we define two variables VI and V3 as concepts. Then we define the ab- 
stract case l which has the two variables VI and V3. After that we define the indi- 
vidual ivl as an instance of VI and iv3 as an instance of V3. ivl has a big value and 
therefore is stable iv3 has a medium value and is increasing. Finally we define the 
individual ease l as an instance of the abstract case l. Note, that the variable de- 
scriptions qualitative value and tendency only characterize the variables. They do not 
influence the classification of the abstract cases. 



5 Case-Based Reasoning Tasks 

The CBR tasks presented in Section 2 are developed in the following: 



Retrieve 

The role of the retrieve task is first, to determine the most specific concepts of the 
current case Cc and second, to sort these abstract cases using the following measure: 



riiACi-ACc) 



Card[ABN^c). 



Card 









( 1 ) 




Using Description Logics for Case-Based Reasoning in Hybrid Diagnosis 361 



which indicates the degree of similarity between the retrieved abstract cases ACi and 
the abstract current case ACc, represented by the number of the common abnormal 
variables. 

Finally, it computes among the individual cases of the retrieved abstract cases, the 
case the most similar to the current one by using the case-based learning (CBL) algo- 
rithm CBLl [8], which defines the similarity of the cases Ci and C 2 as: 



Similarity{Ci,C2-p) 



1 

_ dissimilarity{Cij , Cn) 



( 2 ) 



0 < Similarity < 1 , 



where P is the set of predictor features which are the variables and their values in our 
case. The Feature dissimilarity is calculated as follows: 



Feature _ dissimilarity{Cn , Cn) 



0. if Cu = C2i^ 

1, otherwise. 



(3) 



If there are more cases with the same similarity, we use the coverage criteria (see [9]) 



Coverage ACi = 



Number of _ individuals of _ ACi 
Total _ number _of _ individuals _ in _ the _ case _ base ’ 



(4) 



0 < Coverage < 1 . 



This value compares the amount of individual cases of a certain abstract case with the 
total number of cases in the case base. It therefore reflects the importance of a given 
abstract case. 



Reuse 

• If an identical abstract case was found the solution of its most similar individual 
case is directly applied to the current-case. 

• If no identical abstract case was found, the CBR system uses the procedure de- 
scribed by Fig. 2 to propose a solution. 

Revise 

The revise task consists of validating the proposed solution by simulation using the 
model of the system. This task is realized by the hybrid diagnosis system, and its 
description is out of the purpose of this paper. 



Retain 

The new abstract case is introduced into the case base by classification, if it does not 
exist already. Furthermore, the new individual case is defined to instantiate its ab- 
stract case. 




362 Yacine Zeghib et al. 



/* start from the set of seleeted eases */ 

BEGIN 

1. select the solution (DIAGl) of the case with the highest similarity values 

2. create an abstract-case: non-resolved-part; 

3 . if non-resolved-part exists then 
begin 

fore = each case do compute similarity (c, non-resolved-part); 
select the solution (DIAG2) of the 
most similar case; 

DIAGcurrent-case = DIAGl u DIAG2 

end; 

4. if non-resolved-part does not exist then 
begin 

compute the set of the children of non-resolved-part; 
select the children whose ABN is a subset of the ABN of the 
abstract-current-case; 
if exists then 
begin 

for c = each case do compute similarity(c, non-resolved-part); 
select the solution (DIAG2) of the most similar case; 
DIAGcurrent-case = DIAGl u DIAG2; 

end; 

else DIAGcurrent-case = DIAGl; 

end; 

END 



Fig. 2. The reuse proeedure 



6 General Algorithm 

When the hybrid diagnosis system (HDS) deteets a fault it ereates a eurrent ease and 
its abstraet ease. Then it starts the CBR proeess by sending the eorresponding eom- 
mands to the system CICLOP. The interaetion between the HDS and CICLOP is 
realized as deseribed in Fig. 3. For this task CICLOP is run as a server and eommuni- 
eates with another elient via a simple, text file-based protoeol. Note, that in our appli- 
eation the elient is the hybrid diagnosis system implemented with G2. The general 
proeedure of the behavior of the CBR system is deseribed in Fig. 4. 





Using Description Logics for Case-Based Reasoning in Hybrid Diagnosis 



363 



Reading Commands 



Writing Commands 



[ CICLOP Server ] 




Writing Results 




Fig. 3. The CICLOP server protocol 



/* start from the given eurrent-ease and its abstraet-eurrent-ease */ 

BEGIN 

1 . if the current case is a failed case then go to END 

2. if the current-abstract-case exists then 
begin 

select its concrete cases; 

fore = each case do compute similarity (c, current-case); 

select the most similar case me; 

copy the solution of me to the current-case; 

end; 

3. if the current-abstract-case does note exist then 
begin 

Compute the set of the most specific abstract cases of the current case; 
Reuse the solution of the retrieved abstract cases to the current-case; 
Revise the proposed solution by simulation; 
if success then 
begin 

retain the new abstract-case; 

go to END 
end; 

if there is new available data then go to 1 
else retain the current-case as a failed case; 

end; 

END 



Fig. 4. The general procedure 



7 Example of a Reasoning Session 

To illustrate the reasoning proeess we eonsider the example of a system with the fol- 
lowing set of observed variables {V1,V2,V3,V4,V5,V6,V7}. The seven abstraet eases 
are represented in Fig. 5. The eorresponding eight eases are listed below and summa- 
rized in Table 1. 







364 Yacine Zeghib et al. 



ABNcasej {(VI, big, stable)} 

ABNcase _2 {(VI, medium, deereasing), (V2, small, inereasing)} 

ABNcase _3 {(VI, Small, stable), (V4, medium, deereasing)} 

ABNcase 4 {(VI, medium, stable), (V3, big, deereasing), (V5, medium, inereasing)} 
ABNcase _5 {(V2, Small, deereasing), (V5, medium, deereasing)} 

ABNcase_6 {(V3, medium, inereasing), (V5, small, inereasing), (V7, big, deereasing)} 
ABNcase _7 {(VI, big, deereasing), (V2, small, inereasing), (V4, big, stable)} 
ABNcase_8 {(VI, medium, deereasing), (V2, big, stable), (V4, medium, deereasing)} 




Fig. 5. The graphical representation of the case base 



Table 1. Cases and their corresponding abstract cases 



Abstraet eases 


Cases 


Abstraet-ease-1 <V1> 


Case-1 


Abstraet-ease-2 <V1, V2> 


Case-2 


Abstraet-ease-3 <V1, V4> 


Case-3 


Abstraet-ease-4 <V1, V3, V5> 


Case-4 


Abstraet-ease-5 <V2, V5> 


Case-5 


Abstraet-ease-6 <V3, V5, V7> 


Case-6 


Abstraet-ease-7 <V1, V2, V4> 


Case-7, Case-8 



Suppose we have the following ABN of the eurrent ease: 

ABNcurrent_case {(VI, big. Stable), (V2, medium, inereasing), (V3, small, deereasing), 
(V4, medium, inereasing), (V5, big, stable)}. 

The abstraet ease eorresponding to this eurrent ease is <V1,V2,V3,V4,V5>. 



Table 2. Selected cases and their similarity levels with respect to the current-case 



Seleeted eases 


Abstraet eases 


r\ 


Coverage 


Similarity 


Case-4 


Abstraet-ease-4 


0.6 


0.125 


0.316 


Case-5 


Abstraet-ease-5 


0.4 


0.125 


0.271 


Case-7 


Abstraet-ease-7 


0.6 


0.25 


0.316 


Case-8 


Abstraet-ease-7 


0.6 


0.25 


0.301 






Using Description Logics for Case-Based Reasoning in Hybrid Diagnosis 365 



• Retrieve: This abstraet ease does not exist in the ease base. So, we eompute the set 
of the most speeifie abstraet eases 

{Abstraet-ease-4, Abstraet-ease-5, Abstraet-ease-7 } 

For eaeh individual eorresponding to these abstraet eases we ealeulate the similari- 
ties measures with respeet to the eurrent ease individual. Using the similarity 
measures deseribed in Seetion 5, we obtain the results shown in Table 2. 

• Reuse: 

1. The ease with the highest similarity value is Case-7. The first diagnostie part 
DIAGl is therefore be ehosen to be the diagnostie of Case-7. 

DIAGl=DIAGcase-7 

Case-7 represents only VI, V2, V4, so we still have to find a solution for the re- 
maining variables. 

2. The abstraet ease representing the non resolved part: non-resolved-part 
< V3, V5>, does not exist in the ease base, that is why we eontinue with step 4. 

4. The only direet subsumer of the non resolved part whose ABN is a subset of the 
ABN of the abstraet eurrent ease is Abstraet-ease-4. Sinee Abstraet-ease-4 eon- 
tains only one individual ease whieh is Case-4, the seeond part of the diagnostie 
DIAG2 is the diagnostie of this Case-4: 

DIAG2 = DIAGcase-4 

Finally the diagnostie of the eurrent ease is the reunion of the two parts. 

DIAG Current-case = DIAGl U DIAG2 

• Revise: Suppose the proposed solution has been simulated and validated in the real 
world with sueeess, then we go to the next step. 

• Retain: The eurrent abstraet ease Abstraet-ease-8 in our example is learned, i.e. 
elassified into the ease base as a eoneept. This way the CB is updated. Also, the 
eurrent individual is inserted as an instanee of the Abstraet-ease-8 (see Fig. 6). 



8 Conclusion and Future Work 

In this paper we proposed how to design a ease-based reasoning system using de- 
seription logies. The formal DL representation enables the use of eorresponding infer- 
enee methods, and thus provides effieient means for elassifying and querying the ease 
base. 

We have deseribed the main proeesses of the CBR eyele and how they have been 
realized in our approaeh. The work whieh remains to be done is: 

• investigating the retrieve task, by using the (NOR) set to perform more effieient 
similarity measures, on the hand and to better being able to distinguish between the 
eases on the other. 

• investigating the reuse task, by using more effieient teehniques like taking into 
aeeount the knowledge about the solution part (DIAG) of the ease or/and, the fail- 
ure mode of the eomponents, for instanee. 




366 Yacine Zeghib et al. 




Fig. 6. The case base after the new abstract case has been learned 



References 

1. Zeghib, Y., Rousselot, F., Keith, B.: Building a Diagnostie System for Large 
Industrial Plants: Using Symbolie and Numerieal Knowledge. lAR 98, MUL- 
HOUSE(1998) 46-52 

2. De Bertrand de Beuvron, F., Rousselot, F., Grathwohl, M., Rudloff, D., Sehliek, 
M.: CICLOP. System Comparison of the International Workshop on Deseription 
Logies ’99, Linkoping, Sweden (1999) 

3. Lenz, M., Bartseh-Sp6rl,.B., Burkhard, H.-D., Wess, S.: Case-Based Reasoning 
Teehnology: From Foundations to Applieations. Springer- Verlag, Berlin Hei- 
delberg New York (1998) 

4. Aamodt, A., Plaza, E.: Case-Based Reasoning: Foundational Issues, Methodo- 
logieal Variations, and System Approaehes. Artifieial Intelligenee Communiea- 
tions, Vol.TNo. 1.(1993) 

5. Napoli, A., Lieber, J., Simon, A.: A Classifieation-Based Approaeh to Case- 
Based Reasoning. International Workshop on Deseription Logies, Gif sur Yvette 
Franee (1997) 

6. Baader, F., Hollunder, B.: A Terminologieal Knowledge Representation System 
with Complete Inferenee Algorithms. In Proeeedings of the Workshop on Proe- 
essing Deelarative Knowledge, PDK-91, number 567. Leeture Notes in Artifi- 
eial Intelligenee. Springer- Verlag (1991) 

7. Salotti, S., Ventos, V.: Study and Formalization of a Case-Based Reasoning 
System with a Deseription logie. LIPN-CNRS URA 1507, Universite Paris- 
Nord Franee (1997) 

8. Aha, W. D.: Case-Based Learning Algorithms. Case-Based Reasoning Work- 
shop. Morgan Kaufmann (1991) 

9. Maria, M. :Etudes des aspeets lies au eontenu et a I’organisation de la memoire 
dans le RaPs. Eeole de Mines, Paris (1998) 





Printer Troubleshooting Using Bayesian Networks 

Claus Skaanning\ Finn V. Jensen^ and Uffe Kjaerutf 



' Hewlett-Packard Company 
Claus_Skaanning@hp . com 

^ Department of Computer Science, Aalborg University, Denmark 
{ fvj , uk}@cs . auc . dk 



Abstract. This paper describes a real world Bayesian network application - 
diagnosis of a printing system. The diagnostic problem is represented in a 
simple Bayes model which is sufficient under the single-fault assumption. The 
construction of this Bayesian network structure is described, along with 
guidelines for acquiring the necessary knowledge. Several extensions to the 
algorithms of [2] for finding the best next step are presented. The 
troubleshooters are executed with custom-built troubleshooting software that 
guides the user through a good sequence of steps. Screenshots from this 
software is shown. 



1 Introduction 

In this paper we will describe an application of Bayesian networks in the area of 
diagnosis. The application is a result of a project that is a collaboration between the 
decision support systems group of Aalborg University in Denmark and Customer 
Support Research and Development, Hewlett-Packard. 

In this paper we will describe an application of Bayesian networks in the area of 
diagnosis. Diagnosis has been an interesting application area for AI methodologies 
due to its high complexity and its requirements for data [5, 6, 2]. The purpose of 
diagnostic systems is to ultimately determine the set of faults that best explains the 
symptoms. The system can request information from the world, and each time new 
information is obtained, it will update its current view of the world. 

Our application is a printing system which consists of several components, the 
application being printing from, the printer driver, the network connection, the printer 
itself, etc. It is a complex task to troubleshoot such a system and the printer industry 
spends millions of dollars a year on customer support. Therefore, automating the 
troubleshooting process as much as possible would be highly beneficial. If the 
customer is guided through a successful diagnostic sequence that concludes with a 
solution to his problem, then one less phone call will be received. If, on the other 
hand, the troubleshooter is unable to find a solution, all the information gathered so 
far will be transferred to a support agent who will continue the troubleshooting. 

This work is partly based on the methodology of [7] providing a framework for 
suggesting sequences of questions (observations in their terminology), repair actions, 
and configuration changes to obtain further information. Assuming only a single fault 
and independent actions, the method finds the optimal sequence of actions. It is, 
however, myopic wrt. questions, i.e., limited to one-step lookahead wrt. these. We 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 367-380, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




368 Claus Skaanning et al. 



have modeled the printing system with a Bayesian network: a directed acyclic graph 
representing the causal relationships between variables that associate conditional 
probability distributions to variables given their parent variables. 



2 An Example 



In Figure 1 we show a screenshot of the SACSO troubleshooter. The 
troubleshooter guides the user through a good troubleshooting sequence to resolve the 
error condition that he is currently experiencing. The graphical user interface allows 
the SACSO team to track the computations of the algorithms for finding the best next 
step. The troubleshooter can suggest repair actions that may solve the problem, or 
questions about the printing system. 

The user interface shows the currently suggested steps, and waits until the user 
provides the result to the step (whether an action solved the problem or not, or the 
answer to a question). The currently suggested error condition is light print - a 
common problem on printers. The problem of light print has both hardware and 
software causes, and some of the first troubleshooting steps selected by the diagnostic 
engine attempt to decide whether the cause is in the hardware or software section, 
e.g., ”Is the printer configuration page printed light?". 






Currenl step : 



-What problem do you have ? 


Problem session 


-Log 


- Miscellaneous 


-History 


Selector 1 1 


Call ID : 1 


Log case || 


Validation | 


Show ■•> 1 


I Light Print 


^1 


O. 

o 


Log directory || 


■jave probabilitie:| 

V~ Call agent 


Load 1 












Performed steps : Deferred steps : 

Ensure that economode is not on in the printer ^ 
driver : NO 

Is the printer configuration page printed light 




Computational details - 
Causes : 



19.5 Economode/draft mode in application 
1 7.2 T oner distribution problem (e.g., low) 
12.7 Paper path dirty 



1 



8.7 Wrong driver used 

7.5 T ransfer roller defective / dirty 

G.9 Media out of spec 

5.2 Permanent problem 


d 


Solution actions : 


P/C: 


Ensure that economode is not on in the ap 
Remove, shake and reinsert toner cartridg 
Ensure that 300 dpi is not set in the printer 
Ensure that 300 dpi is not set in the applic- 
Clean inside of printer per user manual 


9.5(1 9.1 

5.7(11.3/2lJ 

2.3(23/1) 

1.9 (3.8/2) 
0.8(12.4/1^ 




lECR: 128.9 


Questions and test actions : 


ECO 



Is the printer configuration page printed ligi 102.7 (4.0) 
Is the toner cartridge manufactured by HP 129.4 (1.0) 
Did you recently perform the user maintenc 129.B - (1.0) 
I s the user maintenance due ? 1 29. 8 - (1 . 0) 

Do you see TONER LOW on the control p 130.3 (1.0) 



Fig. 1. A screenshot of the SACSO troubleshooter. 

The troubleshooter continuously displays a list of causes sorted wrt. their 
probabilities, a list of repair actions sorted wrt. their ratios P / C (probability of 
solving the problem divided by cost of performing the action), and a list of questions 
and test actions sorted wrt. ECO (expected cost of asking the question first and then 
followed by a sequence of actions to solve the problem). 




Printer Troubleshooting Using Bayesian Networks 369 



An example run with the troubleshooter could look as follows. The HP MIOl not 
ready error code will be used as example. MIOl refers to one of the network cards 
used to connect a printer to the network. Assuming that a defective printer network 
card is the cause of the problem, the troubleshooter will guide the customer through 
the following actions and questions: 

□ Question: Did you wait 5 minutes for initialization? The customer answers no and 
is told to wait 5 minutes for proper initialization. As it does not solve the problem, 
the system continues. 

□ Action: Remove network cable. This is done first, as it can rule out a relatively 
likely cause (data from the network causing the error code - 10%) with a very low 
cost (1 minute). It does not solve the problem, and the system continues. 

□ Action: Try another network card. This is done next, as it can help to rule out one 
of the most likely causes, defective card (20%). In this case it solves the problem 
but the system cannot determine the underlying cause. The user has to reinsert the 
old network card and continue troubleshooting. 

□ Action: Ensure that the network card is supported by the printer. As the card 
follows the specification this doesn't help and the system continues. 

□ Action: Reseat network card. This will rule out whether the network card was 
improperly seated. It does not solve the problem, and the system continues. 

□ Action: Move network card to another printer. As the card is defective, the other 
printer will show the same error code as the current. This information is reported 
to the troubleshooter that finally concludes that the card is defective. 



3 Bayesian Networks and Troubleshooting 



3.1 Bayesian Networks 

Bayesian networks provide a way of modeling problem areas using probability 
theory. The Bayesian network representation of the problem can then be used to 
provide information on some variables given information on others. A Bayesian 
network consists of a set of variables (nodes) and a set of directed arcs connecting the 
variables. Each variable has a set of mutually exclusive states. The variables, 
together with the directed arcs, form a directed acyclic graph (DAG). For each 
variable v with parents W],...,Wn, there is specified a conditional probability table 
P(v\wj,...,Wn). Obviously, if v has no parents, this table reduces to a marginal 
probability distribution P(v). For further introduction to Bayesian networks the reader 
is referred to [9] and [4]. 

Bayesian networks have been used for many application domains with uncertainty, 
such as medical diagnosis, pedigree analysis, planning, debt detection, bottleneck 
detection, etc. However, the major application area has been diagnosis, which lends 
itself very well to the modeling techniques of Bayesian networks, i.e., underlying 
factors that cause diseases/malfunctions that again cause symptoms. 




370 Claus Skaanning et al. 



The currently most efficient method for exact belief updating in Bayesian networks 
is the junction-tree method [10] that transforms the network into a so-called junction 
tree. The junction tree basically clusters the variables such that a tree is obtained (i.e., 
all loops are removed) and the clusters {cliques in the junction-tree terminology) are 
as small as possible. In this tree, a message passing scheme can then update the 
beliefs of all unobserved variables given the observed variables. Exact updating of 
Bayesian networks is NP-hard [3], however, it is still very efficient for some classes 
of Bayesian networks. 



3.2 Troubleshooting 



Assume that we want to troubleshoot a malfunctioning device with n possible 
underlying causes represented by the variables Fi,...,Fn (F for fault). In the printing 
system application, components could for instance be the printer driver, the spooler, 
etc. Assume that we have defined repair actions Ai, ..., A^, that have the potential to 
solve the problem, and that each repair action Ai has a probability Pi = P(Ai = yes | e) 
of solving the problem given current evidence, and a cost Ci = Ci(history) of 
performing the action that may be dependent on the actions that were performed in 
the past. The cost may be combined from several cost factors such as the time it takes 
to carry out the action, money required to buy requisites, etc. 

The measure of efficiency for a troubleshooting sequence is the expected cost of 
repair, ECR, i.e., the average cost of repair for all possible troubleshooting sequences. 

Now, assume that we have performed some actions already, and the information e 
acquired from these actions yields probabilities Pi = P(Ai=yes | e) that the action Ai 
will terminate the sequence with an additional cost of Ci. Consider two candidate 
actions Ai and Aj. We will investigate the two scenarios: a) first perform Ai and if not 
successful then follow with action Aj, b) first perform Aj and if not successful then 
follow with action Ai. 

In the case of (a), the contribution to the expected cost of repair is : 

Ci + P{Ai = no)Cj (i) ( 2 ) 

where Cj(i) is the cost of performing Aj after Ai has been performed. The contribution 
to the expected cost of repair in the case of (b) is : 

Cj +P('^j =T^o)Ciij) ( 2 ) 

So if 

C, + P(Ai = no)Cj (i) < Cj + P{Aj = no)C, (j) O) 



it will be best to perform Ai first followed by Aj. 

Eq. (3) is not very useful for the general troubleshooting situation but it is a good 
starting point when looking for simplifying properties as well as approximation 
methods. If for example we can assume that the cost of an action is independent of 
what has been performed and observed, then Eq. (3) becomes 



( 4 ) 




Printer Troubleshooting Using Bayesian Networks 371 



The value Pi/Ci is the efficiency of the action Ai. Note that the efficiency of an 
action varies with actions performed and observations made. 

A tempting approach for determining an optimal sequence would be to always 
perform the action with the highest efficiency. Unfortunately, this method does not 
guarantee that an action sequence of lowest expected cost of repair is determined. 
However, if we further can assume that during execution of actions the 
probabilities Pi for the remaining actions are all changed by the same factor, then all 
efficiencies will also be changed by a constant factor. In that case we can start the 
troubleshooting task by simply ordering the actions after decreasing efficiency and 
this will be a sequence with lowest expected cost. This extra assumption is satisfied if 
we assume : 



1 . There is only one fault 

2. Each action can at most solve one cause 



Troubleshooting sequences under these conditions are discussed further in [7]. If 
these assumptions are satisfied, it is straightforward to show that the expected cost of 
repair for an optimal sequence Ai, ..., A^, is 

k-\ 

ECR =Ci + (1 - Pi)C2 +(!-/’,- Pi)C^ + ... + (1 

i=\ 

Assuming further that we have questions, Qi, ..., Q^, that can be asked to supply 
information about the error condition. To determine whether it is best to ask question 
i first, we compare ECR of the optimal sequence with ECOi : 

ECO, = C, + ^ P{Q, =s)x ECR{Q, = s) (6) 

Qi=s 

where Ci is the cost of finding an answer to the question, s is an answer to the 
question, and ECR(Qi=s) is the expected cost of the optimal sequence given that Qi 
has been answered with s. 




Fig. 2. An example of the very simple Bayesian network stmeture used for 

troubleshooters. 



In the troubleshooting process, a time may come where remaining actions have too 
low efficiency. Then, it may be better instead to give up and call for assistance. This 
service call is represented as an ordinary repair action with probability 1 of solving 
the problem and a very high cost. 




372 Claus Skaanning et al. 



In the area of printer systems the costs of actions are very rarely dependent on the 
past, and there is almost always just one fault. However, actions can often solve more 
than one cause thus it is not possible to find the optimal sequence by sorting the 
actions wrt. efficiency, and the expected cost of repair can not be computed with 
Eq. (5). Approximate algorithms for finding a good sequence and computing its ECR 
are presented in Section 5. 

The efficiency and simplicity of the representation and algorithms described in this 
paper all depend on the assumption that there is only a single fault, i.e., the single- 
fault assumption. The single-fault assumption is very natural in the printer domain 
where it is very rare that multiple components stop functioning at the same time. If 
multiple faults are present, the algorithms presented in this paper will also solve the 
problem, however, not in an optimal manner. 



4 Representation and Knowledge Acquisition 



4.1 The Overall Structure 



The SACSO printing diagnosis system consists of many separate Bayesian 
networks each modeling a printing error. The networks can be kept separate as the 
exact error condition that the user is experiencing is always known at diagnosis time. 

Each of these models include a cause variable that defines the probability 
distribution over the causes of the error condition. The causes are modeled as the 
states of this variable. All actions and questions that can be posed in the 



0.2 



0.1 



Problem 

/ 



0.4 



Cause 1 ) 



Cause 2 



AH 



! Subcause \ 
1 / 



0.6 



T Subcause 

0 . 8 ^"- — 



Fig. 3. A simple Bayesian network with an example probability assignment. 



troubleshooting process are represented as children of the cause variable. An example 
is shown in Figure 2. The benefit of this Simple Bayes structure is that all actions and 
questions are independent given the cause. This can be exploited in the algorithms for 
finding the best next step as shown in later sections. 

In the following it will be described how causes, actions and questions are 
represented and how the required information is acquired from domain experts. 




Printer Troubleshooting Using Bayesian Networks 373 



4.2 Causes 
Cause trees 

Causes are organized in a tree such that the root of the tree corresponds with the 
problem-defining node, i.e., the node indicating whether or not the problem is present. 
The children of the root node are causes or components that, if present or 
malfunctioning, cause the problem to be present. The children of causes or 
components are subcauses or causes that, if present, cause the presence of the parent 
cause, etc. 

The causes can always be organized into a tree due to the single-fault assumption. 
If only a single fault is assumed it is not possible to have subcauses that can cause 
more than one higher level cause simultaneously. If there is a subcause that can cause 
more than one higher level cause, then we must have that this subcause can be 
represented as two mutually exclusive components where each of them causes one of 
the higher level causes. In this case, the subcause might as well be represented as two 
independent subcauses each affecting its respective parent cause - so the tree structure 
is maintained. Thus, due to the single-fault assumption, loops are not possible and 
we have a tree of causes. 

A small example of such a cause tree is given in Figure 3 . In this simple example, 
we have a problem with two possible causes, and one of these has two possible 
subcauses. 

Knowledge acquisition 

Probabilities for causes are acquired opposite of the causal direction in the cause 
tree, i.e., the domain experts specify probabilities for causes conditional on the 
presence of their parent cause. For the example in Figure 3 , domain experts have to 
specify P(Causci Problem), P(Cause2Problem), P(Subcausei|Causci) and P(Subcausc2 
I Causc2). An example specification of these probabilities can be seen in Figure 3 . 

The SACSO domain experts were trained at troubleshooting printer system 
problems, thus they were used to consider a set of causes conditional on the presence 
of the problem, and in many cases a set of subcauses conditional on the presence of 
the parent cause when all other causes have been ruled out. 

With traditional methods, the domain experts assign marginal probabilities for the 
leaf causes, e.g., P(Cause2), P(Subcausei) and P(Subcausc2) in Figure 3 . When there 
are many leaf causes, the probabilities of many of these must be small and eliciting 
them becomes harder. When probabilities of leaf causes are assessed assuming the 
presence of their parent cause, the domain expert only has to consider a small set of 
causes (those that can cause the parent cause). He must assess probabilities for this 
small set of causes such that they sum to 1 which is easier than assessing the 
unconditional probabilities. 

Representation in cause variable 

The probabilities elicited for the cause tree have to be transformed into a flat 
probability distribution that can be represented in the cause variable. The cause 
variable has a state for each leaf cause and the probability of the state is calculated as 
the probability of the leaf cause given its parent cause multiplied with the product of 
all the conditional probabilities of its descendants given their parent, i.e., if leaf 
cause F has parent Fi, Fi has parent F2, ..., F^.i has parent F^, and F^ is the root, then 




374 Claus Skaanning et al. 



P(cause variable = /) = P{F | /i ) x P{Fj \ ) (7) 

It is not a problem that non-leaf causes are not directly represented as states in the 
cause variable, as the probabilities of these aggregated causes can be found as the sum 
of all descendant leaf causes. 

One of the underlying assumptions is that the system is only used when there is a 
problem, thus it is not necessary to have a state representing no problem. 

The cause tree can also be represented in other manners, however, the single 
variable representation has the lowest complexity with a resulting junction tree of the 
order NpXNs where Np is the number of causes and Ns is the number of 
troubleshooting steps. 



4.3 Actions 
Definition and types 

Actions are troubleshooting steps that when carried out by the user can potentially 
make the problem go away. There are two types of actions, repair actions and test 
actions. Repair actions (e.g., reseat the parallel cable) can solve the problem and thus 
end the troubleshooting process whereas test actions (e.g., try another parallel cable) 
change the configuration of the system to test whether the problem goes away. No 
matter the answer to a test action, the troubleshooting process will continue. 

The knowledge acquisition and representation of the two types of actions are 
exactly the same, however, they are treated differently in the algorithms for finding 
the best next step described in Section 5. Repair actions are handled as actions 
whereas test actions are handled similar to questions. 

Here, actions are not associated with a single cause as in [2]. In practice this poses 
much too severe restrictions on the domain experts when constructing 
troubleshooters, as many actions naturally affect more than one cause. 

Knowledge acquisition 

The knowledge acquisition for actions consists of three steps, (i) listing the causes 
that the action can solve, (ii) eliciting probabilities that the action solves these causes, 
and (iii) eliciting the cost factors of the action. 

If an action solves more than one cause, it is sufficient to elicit one probability for 
each cause that it solves, i.e., the probability of the action solving the problem 
assuming that the cause is the actual underlying cause. Due to the single-fault 
assumption, it is not necessary to consider combinations of the causes. 

The assessment of the probability that an action solves a cause has been split into 
three pieces, (i) P(A=yes | f, correct, requisites), the probability of the action solving 
the problem assuming that the cause f is present, the action is performed correctly and 
all requisites are in perfect order, (ii) P(correct), the probability that the action is 
performed correctly, and (iii) P(requisites), the probability that all requisites are in 
working order. The three probabilities are then combined into one : 

P{A = yes I /) = P{A = yes | /, correct, requisites) x i^(correct) x i^(requisites) (8) 




Printer Troubleshooting Using Bayesian Networks 375 



This probability elicitation is much easier for the domain expert if it is split into 
three pieces as they then do not have to balance several factors simultaneously in their 
minds. It is based on the assumption that the probability of the cause is independent 
of whether the action is performed correct, or the requisites are in working order. 

The following cost factors are elicited for troubleshooting steps, (i) time, (ii) risk - 
of breaking something else, (iii) money - required to carry out the step, and (iv) insult^ 
- potential insult to user if this step is suggested. Time is elicited in minutes and 
money in dollars, however, risk and insult are specified on a scale of 0 - 4. 

The cost factors are combined linearly to form the overall cost of the step : 

C = aT + PR + + SI (9) 

The weights can be determined by having domain experts perform preference 
elicitations where they select the preferred cost combination from a list of choices. 



Representation 

Actions are represented as children of the cause variable. Thus, in the probability 
table for the action, we specify the probability of the action solving the problem given 
each possible cause. For causes that the action cannot solve, this probability will be 
0. For causes that the action can solve, the probability will be P(A=yes | f) as found 
above. 



4.4 Questions 

Questions are troubleshooting steps that gathers information that can be used in the 
troubleshooting process. There are two types of questions, symptom questions and 
general questions. Symptom questions concern symptoms or effects of the problem, 
and general questions concern something that could have caused or created the 
problem. For both types of questions, the domain expert must first list causes that are 
associated with the question. 

For symptom questions, the domain experts must elicit the probabilities of the 
question answers given each associated cause, and probabilities of the answers given 
that none of the associated causes are present. Questions are represented as children 
of the cause variable. Symptom questions are easy to represent as the elicited 
probabilities can be used directly in the probability table of the node. 

For general questions the domain experts must elicit the probabilities of the 
associated causes given the answer to the question, and the prior probabilities of the 
answers to the question. Thus, the direction of the arcs in the Bayesian network is 
reversed here. When eliciting these probabilities, the following equation must be 
maintained : 

P{F) = Y^P{F\Q = s)P(Q = s) (10) 



^ In printer systems, steps sueh as "Cheek whether the printer is turned on" or "Cheek whether 
the parallel eable is plugged in" ean be insulting to experieneed users. 




376 Claus Skaanning et al. 



Thus, the probabilities that must be specified here are very much interdependant. 
General questions are also represented as children of the cause variable in the 
Bayesian network, however, to represent them this way we need to reverse the 
probabilities from P(F | Q) to P(Q | F). This can be done by applying Bayes’ formula : 

P(oiF)-Wmei ( 11 ) 

P(F) 

Due to the single-fault assumption and the assumption that questions are 
independent of each other given the causes, it can be shown that the probabilities can 
always be reversed using Bayes’ formula. 

Costs are elicited in the same way as for actions. 



5 Algorithms 



5.1 Finding a Good Sequence of Actions 

Assuming a single fault and a one-to-one correspondence between causes and 
actions, the optimal sequence of actions is found by sorting the actions in descending 
order wrt. P(A=yes) / C as described in Section 3.2. However, with the SACSO 
approach, actions can solve multiple causes, i.e., actions are dependent. When actions 
are dependent, it is in general impossible to find an optimal sequence in less than 
exponential time. However, an approximate algorithm that finds sequences of 
reasonable quality is (for k actions) : 

Algorithm 1 
1) For i = 1, ..., k 

a) Sort actions that haven’t been performed wrt. Pi/Ci - pick the best one. A* 

b) Assume that A* is performed and adjust cause probabilities wrt. A* failing : 
P(Fj I A*=no, e) 

By exploiting the fact that all actions and questions are independent given 
knowledge of the cause, it is possible to compute a good sequence of actions using the 
above algorithm very efficiently using Equation (12) below for calculating action 
probabilities, and Equation (13) below for updating cause probabilities conditional on 
an action failing. 

Equations (12) and (13) exploit the fact that actions are independent of any other 
evidence given the cause when we have the simple Bayes structure. Updating the 
action and cause probabilities using these two equations is in fact a very simple belief 
propagation. The simplicity and efficiency is made possible by the single fault 
assumption and simple Bayes structure. 

P(A = yes \e) = 'Y^P{ A = yes \ = yes, e) x P{F^ = yes \ e) ( 12 ) 

Fi 

= ^ P(A = yes I = yes)xP{F^ = yes \ e) 




Printer Troubleshooting Using Bayesian Networks 377 



nriA njc) = PiA = no\f)xP(F\e) (13) 

P(A = no\e) P(A = no\e) 

The expected cost of repair of the sequence found with Algorithm 1 is found as 

ECR=C^+P(A^ =no)C2~\ = no,..., Aj^_^ =no)Cj^ 

= Q +P(Ai = no)C2 +P(A2 =no\Ai = no)P{A^ = no)C2 H H 

P(Aj^_i =no\A^ = Aj^_2 = no)P{Aj^_2 =no\A^ = • • • 

i^(^2 = I = no)P(Ai = ^o)C^ 

All the probabilities P(Ai=no), P(A2=no | Ai=no), P(Ak_i=no | Ai=no, Ak_ 
2=no), and P(Al=no) are determined in Algorithm 1. 



5.2 Does It Pay to Ask a Question First? 

To check whether it pays to ask a question first according to the Breese- 
Heckerman approach, we compare the expected cost of asking the question first and 
then performing a sequence of actions with just performing a sequence of actions 
without asking the question. 

As described in Section 3.2, it is necessary to calculate ECOq for each question to 
check whether it pays to ask it first. This can be done quite efficiently and exactly by 
exploiting the fact that knowledge of the cause renders everything else independent in 
the Simple Bayes structure. First, the cause probabilities are updated given the state 
of the question using the following equation : 

r(r\C ,c) P(Q = s\f,e)xP(F\e) PiQ = s\f)xP{F\e) (15) 

P(Q = s\e) P(Q = s\e) 

When the cause is given, the probability of the question is independent on all other 
evidence in the network, thus P(Q=s | f,e) = P(Q=s | f). When the probabilities of the 
causes have been updated, we can find the new probabilities of the actions using 
Eq. (12). From the updated probabilities for the actions, P(Ai=yes | Q=s, e), we can 
then find ECR(Q=s). 



5.3 Limited n-Step Lookahead for Questions 



With algorithm 1, we have an approximate n-step lookahead procedure for actions. 
However, we only have one-step lookahead for questions as we only compare two 
situations, (i) Q is not asked at all, and (ii) Q is asked first. To improve upon this, we 
have introduced Pq^=P(Q identifies cause) : 



Pq = maxj 



P(F Q = s,e)-P(F e) , 

max, — ^ — '—-xP(Q = s\e) 
' l-P(F\e) 



(16) 



This equation finds the maximum relative increase in the probability of a cause, if 
question Q is answered. The increase is considered relative to the maximum increase 
possible, from P(F) to one. Further, the maximum increase is multiplied with the 




378 Claus Skaanning et al. 



probability of getting the relevant answer to the question. Pq^ is related with Mycin 
certainty factors [13]. 

Using the Pq^ value it is possible to treat questions as actions when finding a good 
sequence using algorithm 1 . This gives us a more extended lookahead for questions, 
as it allows us to examine whether it pays to ask the question to identify causes at any 
position in the sequence. 



5.4 Does It Pay to Wait with the Question? 



The Breese-Heckerman algorithm compares two situations : 

1. Performing the best sequence of steps, never asking the question. The expected 
cost of this is ECR. 

2. First asking the question, and then performing the best sequence of steps based on 
the answer. The expected cost of this is ECOq. 

This comparison is biased towards the question, as it is often better to ask the 
question if the alternative is to never ask the question at all. Thus, questions may be 
selected because the troubleshooter assumes that it is now or never. This is an 
incorrect assumption and we can remedy the problem by giving the troubleshooter 
one more choice : 



3. Perform the best action, then ask the question, and perform the best sequence of the 
remaining steps based on the answer. The expected cost of this is ECOq^^\ 



Thus, when ECOq is lower than ECR, we can double-check whether it really pays 
to ask the question first by computing ECOq^^\ If ECOq^^^ is lower than ECOq, then 
it is better to postpone the question until later. Again, this check can be performed 
efficiently as it does not require any extra belief propagations. ECO q^^^ can be found 
as follows, where Ai is the best action to perform first, and ECR^‘^\Q=s) is the 
expected cost of the best sequence of actions where Ai has been excluded. 

r \ 



=C,+ P(A, = no) 



Cg + 



J^F(Q = s)xECp(-^\Q = s) 



(17) 



Q=s 



J 



6 Discussion and Conclusions 

The scheme for gathering prior probabilities described in Section 4.2 was 
developed due to the inefficiency of current standard methods which quickly became 
apparent. The current standard methods where prior probabilities are estimated 
without context, forced the domain experts to estimate probabilities of very rare 
events having very little information to base them on. As a result, the process was 
slow and the results were inaccurate. Instead, it was decided to follow the scheme 
presented in this paper which resulted in a tremendous speed-up: probabilities for 
a 2000-variable printing system network were gathered in only one week - on the 
average 3 probabilities per minute. 




Printer Troubleshooting Using Bayesian Networks 379 



Much information was gained from having the domain experts verify the first 
prototypes of the troubleshooting models. Each time the experts noticed something 
behaving differently than they expected, it was either because of missing information 
in the model, or their troubleshooting processes not being completely logical. During 
the verification of the prototypes, two missing components of the cost were 
determined, namely risk and insult levels. 

As mentioned in Section 4.2, the experts elicit probabilities based on the single- 
fault assumption for the simple reason that it greatly reduces the time consumption of 
the process. If multiple simultaneous faults are not ruled out, it is necessary to use 
other assumptions to prevent the experts from having to elicit probabilities for every 
possible combination of causes. For example, an assumption of independent causes 
could be used [14]. 



References 



1. Andersen, S.K., Olesen, K.G., Jensen, F.V. and Jensen, F. (1989). HUGIN - a 
Shell for Building Bayesian Belief Universes for Expert Systems. Proceedings 
of the Eleventh International Joint Conference on Artificial Intelligence. 

2. Breese, J.S. and Heckerman, D. (1996). Decision- theoretic Troubleshooting: A 
Framework for Repair and Experiment. Technical Report MSR-TR-96-06, 
Microsoft Research, Advanced Technology Division, Microsoft Corporation, 
Redmond, USA. 

3. Cooper, G.F. (1990). The Computational Complexity of Probabilistic Inference 
using Bayesian Belief Networks. Artificial Intelligence, 42:393-405. 

4. Cowell, R.G., Dawid, A.P., Lauritzen, S.L., and Spiegelhalter, M.R.C. (1999). 
Probabilistic Networks and Expert Systems. Springer- Verlag, 1999. 

5. de Kleer, J. and Williams, B. (1987). Diagnosing multiple faults. Artificial 
Intelligence, 32:97-130. 

6. Genesereth, M. (1984). The use of design descriptions in automated diagnosis. 
Artificial Intelligence, 24:311-319. 

7. Heckerman, D., Breese, J., and Rommelse, K. (1995). Decision-theoretic 
Troubleshooting. Communications of the ACM, 38:49-57. 

8. Henrion, M., Pradhan, M., del Favero, B., Huang, K., Provan, G., and O'Rorke, 
P. (1996). Why is Diagnosis using Belief Networks Insensitive to Imprecision in 
Probabilities? Proceedings of the Twelfth Conference on Uncertainty in 
Artificial Intelligence, 1996. 

9. Jensen, F.V. (1996). An Introduction to Bayesian Networks. UCL Press, 
London. 

10. Jensen, F.V. and Lauritzen, S.L. and Olesen, K.G. (1990). Bayesian Updating 
in Causal Probabilistic Networks by Local Computations. Computational 
Statistics Quarterly, 4:269-282. 

11. Lauritzen, S.L., and Spiegelhalter, D.J. (1988). Local Computations with 
Probabilities on Graphical Structures and their Applications to Expert Systems. 
Journal of the Royal Statistical Society, Series B, 50(2): 157-224. 




380 Claus Skaanning et al. 



12. Shenoy, P.P. and Shafer, G. (1988). An Axiomatic Framework for Bayesian and 
Belief-Function Propagation. Proceedings of the AAAI Workshop on 
Uncertainty in AI, pp. 307-314. 

13. Shortliffe E.H. (1976). Computer-based Medical Consultations: MYCIN. 
American Elsevier Publishers, NewYork. 

14. Srinivas, S. (1995). A Polynomial Algorithm for Computing the Optimal Repair 
Strategy in a System with Independent Component failures, in Proceedings of 
Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, Canada, 
August 1995. 




Using XML and Other Techniques to Enhance 
Supportability of Diagnostic Expert Systems 



Graham Forsyth ^ and John Delaney^ 

^ DSTO, Airframes and Engines Division, Melbourne, Australia 
Graham . Forsyth@dsto . defence . gov . au 

^ eVision Pty Ltd, Melbourne, Australia 
JDelaney@evision . com. au 



Abstract. Over the last year, interest in a new language called extensible 
Markup Language (XML) has risen remarkably. XML, a key part of many 
electronic commerce strategies, may also help make diagnostic software more 
supportable and hence more useful. Until now, Diagnostic Expert Systems usu- 
ally used “shells” comprising an inference engine, user interfaces and a means 
of storing the rules. Our new design incorporates the JavaScript inference en- 
gine and the user interface in code for a standard web browser and translates the 
rules into XML. Supportability is improved by clear separation into: (i) stan- 
dard commercial software maintained by the computing contractor, (ii) active 
web-pages maintainable by the software contractor, and (iii) the commercial 
database of domain knowledge maintained by the local domain experts as elec- 
tronic documentation. A field trial diagnosing TF30 engines in RAAF F-1 1 Is is 
developing procedures and specifications for its production. 



Introduction 

Military hardware tends to stay in service for very long periods of time. The service 
life of many military aircraft often exceeds 30 years (and may exceed 50 years) and 
this contrasts with the often very short life cycle of computing hardware and software. 
Paradoxically, the long service life of aircraft leads to the need for computer-based 
support systems for these aircraft while the short service life of specific computer 
software and hardware generates a new problem with the supportability of the support 
systems themselves. 

In the late 1980s [1 to 4], the Royal Australian Air Force (RAAF) investigated the 
use of computer-based technology to improve diagnostic troubleshooting methods for 
various aircraft systems including jet engines. The diagnostic deficiencies at the time 
led to a concentration on the TF-30 engine in the F-1 1 1 aircraft and an eventual deci- 
sion to build a Concept Demonstrator of such technology. 

This Concept Demonstrator [5 to 9] was given the name “Interactive Fault Diagno- 
sis and Isolation System” (IFDIS) and was produced by Competitive Advantage 
Technology Pty Ltd under direction from the Defence Science and Technology Or- 
ganisation (DSTO). Although limited field trials of this system were successful, pro- 
posals to deploy IFDIS in production did not eventuate. However, the prototype unit 
was used in published research [10 to 13]. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 380-390, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




Using XML and Other Techniques 381 



Recently, the need to extend the service life of these engines has generated new in- 
terest in IFDIS. Unfortunately, the software generated by the 1988 project will not run 
on current computing platforms. This has led to a study of methods to increase the 
likelihood of the 1999 version still being supportable in 2009. Before looking at the 
new version, it will be useful to examine the original version with the benefit of hind- 
sight (spanning 10 years). 



The 1988 Version of IFDIS 

IFDIS was aimed at reducing errors in diagnosing problems with the TF30 engines in 
the RAAF F-1 1 1 aircraft and was designed using best practice of the day; it was rule- 
based and the inference engine comprised a commercial “shell” with interface through 
windows contained in a C-language program. 




Fig. 1. Software layout of user (left) and maintainer (right) versions of IFDIS 1988. 



Care was taken to ensure that rules were adequately verified and the user interface 
was designed to promote user acceptance of IFDIS as an adviser (authority placed 
within the computer was minimised). There are four areas of note. First, time has 
shown this system to be inadequate is its ability to support technology upgrades; major 
revisions would have been needed before now to cover Windows upgrades through 
versions 2, 3, 95, 98 and Windows NT. Second, the software design of the 1988 ver- 
sion did not allow for easy field support since, despite the rule-based approach sepa- 
rating software from the diagnostic rules, maintenance of the rules required proprie- 
tary software. The commercial software shell, which contains the rules and the infer- 
ence engine to interpret them, is protected by an external key inserted in the parallel 
port and the user interface is written as a separate “C” program. A separate arrange- 
ment was adopted for rule maintenance. It used a different version of the commercial 
shell and initially, at least, a different operating system. 

Third, the rules within the rule-base covered only about 30 percent of the possible 
engine problems. This was adequate for some initial use since the “30 percent of 
problems” represent about 70 percent of actual cases and represent an even greater 
fraction of those problems which can be diagnosed with the engine still installed in the 
aircraft. 







382 



Graham Forsyth and John Delaney 



Engine Problem 



AB Will Not: Light: When Selected 



Tesl/Fault: 

Blocked Zone 1 Melerlng Head 
ABIgnilerUalve^^^^^^^^^^^ 



Status Data History 

Evidence I No I 
Evidence | No | 



When the AB light Is selected, do the nozzles unlock? 



Inf ormatlofT) 
c Why o 



Do Not Unlock 



C Select ^ 



[ Don • t Know ] 



J I I I 

No Fault Found 



( Start J [ Fault ^ C Print ^ [information^ [ Finish ^ 



Fig. 2 IFDIS Troubleshooting Screen with Question Window 

The final point concerns the user interface. Troubleshooting is represented as ques- 
tion and answer sessions with overlaying windows (as per Fig 2) displaying the ques- 
tions and accepting answers by choice from a list of possibilities or by clicking on 
“Don’t Know”. Help and other information displays also produce additional superim- 
posed windows. This arrangement received due acceptance from users but modifica- 
tion of the main troubleshooting screen rather than adding layers on top was recom- 
mended. 



Phases for the 1998 Version 

Recent requests by the RAAF for DSTO to investigate methods of utilising the rules, 
at least, of the 1988 IFDIS Concept Demonstrator have led to recognition that the role 
of IFDIS is still useful but the cost of deploying such a system would be prohibitive 
unless supportability could be markedly improved. 

The proposal to improve supportability (and fix the other points noted above) is 
based on increased use of commercial software and a client-server three-tier architec- 
ture including the use of extensible Markup Language (XML) [14]. A description of 
the new software arrangement will be a major component of this paper. 

The project has been implemented using three distinct phases :- 
Feasibility and Useability Demonstration. This phase produced a feasibility study 
of the proposed system, generated an initial design, built a system using the new soft- 
ware design but employing the 1988 rules, and then provided a study of both the 
useability and usefulness of the system produced. The output from this phase was a set 
of documents [15,16] formulating the design and a functional prototype system. 

Trial Version. The software from the first phase was further developed to allow for 
information collection (via printing, commenting and session-saving functions) during 
a trial period at the RAAF base. The rules-base was left substantially as for the origi- 
nal 1988 Concept Demonstrator. The required outputs from this phase are definitions 



Using XML and Other Techniques 383 



of the roles of each involved agency in the support of the final production system and 
the Statements of Requirements for the production system itself 
Production Version. If the RAAF decides to proceed with production after the trial 
period, tenders for contracts will be called based on the outputs of the two previous 
phases. 



Features and Standards of the New Design 

The new version of IFDIS^^ comprises a database and server, a web-server, a web 
browser and a set of interactive (active) server pages. Most of the software used now 
comprises standard commercially-available packages; that includes the database en- 
gine, the active web server and the web browser. The specific software code (mainly 
associated with the active server pages) and the database of engine-specific informa- 
tion (data and rules) form the other two independent parts of the system. These spe- 
cific software items comprise under 1 percent of the total software used, unlike the 
1988 version where most of the software was either specific to IFDIS or related to the 
expert system shell. Possibly the greatest difference between the 1988 and current 
IFDIS concerns the use of industry standards to ensure the greatest possible scope of 
supportability for the system;some of which will be introduced here. 

Markup Languages. Two variations on this approach (Runoff and TgX) became 
standards. Runoff was inherently simple; at least two characters were reserved to 
“frame” layout commands inserted in the text and later versions allowed for script 
programs and internal definition of new commands making the language extensible. 
SGML. Because military hardware tends to be used over very long periods, mainte- 
nance of documents associated with such systems can be quite difficult. Standard 
Graphical Markup Language (SGML), ISO 8879 - 1986, was defined in order to keep 
electronic documentation readable and maintainable for up to 50 years. 

HTML. In order for webpages to be available anywhere in the world on any com- 
puter system, a language called HyperText Markup Language (HTML) is used. De- 
veloped in the early 1990’s, HTML is a specific instance of SGML with emphasis on 
graphical information and with the main output being a screen rather than paper. 

JAVA and JavaScript. JAVA is an interpretive language designed specifically to 
extend the features of HTML. Applets and scripts may be written in JAVA or a subset 
called JavaScript. 

DHTML. The Dynamic HTML is an extension of HTML to allow pages to change 
by changing only elements on the page rather than replacing the whole page. 

ASP. Active Server Pages (ASP) are implementations of DHTML in which calls to 
databases and other sources of information are used to establish the DHTML. 

XML. extensible Markup Language (XML) [14] is an almost complete implementa- 
tion of SGML in the HTML environment. Tags within the language are defined within 
data dictionaries and may be full programs. Depending on the implementation, XML 
programs may run in the browser itself (e.g. Microsoft Internet Explorer 5, Netscape 
5, etc.) or be implemented by JavaScript interpreters as “plug-ins”. Whereas HTML is 
aimed at the display of text and other files to users, XML is designed to implement 
structured data and structured programs within the browser. Both XML and HTML 




384 Graham Forsyth and John Delaney 



are purely text in form, look similar and consist of matched pairs of tags but partial 
files and ambiguity are not allowed in XML unlike HTML which continues to display 
text even after errors or with incomplete files. 

lETM. Interactive Electronic Technical Manuals (lETM) are specifically those elec- 
tronic documents formulated using SGML. Future versions of IFDIS will support 
direct calls to the appropriate pages in these files. 

HTTP. HyperText Transmission Protocol (HTTP) is the transfer procedure for 
HTML, DHTML and XML. Very compact, it allows for fast transfers over networks 
using data compression “on the fly” and allows for partial transfers so pages can be 
read before the file has been completely received. 

HTTPS. This is the Secure version of HTTP with user authentication. IFDIS could 
use either HTTP or HTTPS depending on network requirements. 

Applets and Scripts. Applets are small applications which are called from within the 
HTML or DHTML source often by association of the file extension of a called link 
within the code. Scripts are short client-side programs written and embedded within 
the HTML source. 



Commercial Packages Used in Trial Version 

All of the versions of the new IFDIS up to and including the version for field trials 
includes use of the following commercial packages :- 

• Microsoft Personal Web Server (PWS) with active server page extensions; 

• Microsoft Access 97 Database (as included in Officc97) used with the ODBC32 
driver in Windows 95, Windows98 or Windows NT Version 4; and 

• Microsoft Internet Explorer Version 5 (IE5). 

This use of standard commercial packages to form most of the software environment 
greatly reduces the problem of maintenance. A wide range of other commercial pack- 
ages could have been used instead of those on this list, if browsers supporting XML 
were available to allow these packages to be used. 



Design of the Inference Engine. 

As described above, during the period 1987 to 1989, a Concept Demonstrator of the 
Interactive Fault Diagnosis and Isolation System (IFDIS) for the TF-30 engine in the 
F-111 aircraft was built. IFDIS used Nexpert, a generic off-the-shelf expert system 
shell, for the construction of the knowledge base rules and the in-built inference en- 
gine for processing these rules. Competitive Advantage Technology Pty Ltd, the IF- 
DIS contractor, subsequently (1989 to 1992) developed Diatron, a complete trouble- 
shooting system tailored to user-aided fault diagnosis. That Diatron inference engine, 
re-coded in XML, is the basis for the new IFDIS inference engine design. 

Rule nesting is enabled by assigning a value to a symptom in the conclusion of one 
rule that is a condition of another rule. This enables any level of nested rules. XML 
lists define Actions, Categories, Conclusions, Conditions, Faults, Problems, Reasons, 
Rules, Symptom Choices and Symptoms. An example of such an XML definition is 
included for a rule:- 




Using XML and Other Techniques 385 



<rule id='r299' r_state= ' not_determined ' > 
<problemid>pl</problemid> 

<name>CADC is Possible</name> 

</ rule> 

The features of the inference engine include 

• The symptom ranking for a selected problem controls the rule inference order. 

• The inference engine can determine a fault status of Indicated, Possible or Not 
Indicated. 

• A Fault Status of Indicated cannot be overwritten by a rule conclusion assigning a 
fault status of Possible or Not Indicated. 

• Rule States can be True, False, or Not Determined. 

• Condition States can be True, False or Not Determined. 

• Symptom States can be Not Answered, Answered, or Changed. (The default is 
Not Answered). 

• Symptom KnownStates can be Known or Not Known. This is to handle “Don’t 
Know” answers. (The default is Known). 

• Symptom values can have “Don’t Know” answers. 

• Non-monotonic reasoning (for changing symptoms and re-assessing) is enabled. 

• The conjunctive “AND” joins the conditions of a rule. That is, there are no dis- 
junctive “OR” condition joins in a rule. 

• Rule nesting is incorporated. It is enabled by the assignment of a symptom in a 
conclusion of a rule that is the condition of another rule. 



Design of the Database. 

Like the inference engine, the design of the database is based on that of Diatron. The 
database contains all the knowledge and causal relations acquired for the specific 
problem set. It can be maintained as a separate entity to the rest of the system. The 
Trial IFDIS uses Microsoft Access 97 as the database and allows direct calls to this 
database from the server using the Microsoft Open Database Connectivity (ODBC32) 
driver. The database is also used to hold the case library including user comments. 

Rules in this database are simply entries in lists of Categories, Problems, Faults, 
Rules, Actions, etc. This simplifies the database structure but means all conditions are 
joined by “AND” rather than allowing “OR” connections. 

Collection of problem domain data from the database can be simplified by using a 
routine RecreateXML.asp to generate new XML lists from the database. That ap- 
proach reduces database access and reduces network traffic in a served environment. 
It is also a benefit in the quest for supportability since the translation from database to 
XML code is automatic. 

Adding a new problem to the IFDIS 98 troubleshooting database requires the in- 
formation that defines the rules to be entered into the tables in a particular order. The 
order results from the dependency of some tables on others, in particular for the pri- 
mary keys that are generated automatically when a new entry is made in a table. Ta- 
bles may be considered to be divided into three levels: 




386 Graham Forsyth and John Delaney 



Level 1 tables (Problems, Faults, Actions and Symptoms) 

These tables do not depend on any others for information. It is possible that some of 
the entries required already exist in the database in relation to other problems. If this is 
the case they need not be entered again. 

Level 2 tables (These tables depend on level 1 tables for some input) 

• Problem Faults 

^ An entry is required for eaeh fault that ean be responsible for the problem. 

• Rules 

• Symptom Choiees 

^ New entries will be required only if a new symptom has been added. 

• Problem Symptoms 

^ An entry is required for eaeh symptom that ean be indieative of the problem. 
Level 3 tables (These tables derive at least some of their input from a level 2 table) 

• Reasons 

^ Contains a deseription of why a fault is the eause of the problem. 

^ An entry is required for eaeh rule. 

• Conditions 

^ Eaeh rule may have a number of eonditions joined by the eonjunetive AND. 

• Conelusions 

^ Eaeh rule leads to a single eonelusion, with an aetion relating to the fault in 
question. 

IFDIS is designed to help diagnosis in an interactive fashion. Only two essential 
features of troubleshooting a gas turbine engine are used implicitly in the design of the 
database structure. These are the ability to assign causal relationships as cascading 
fault trees, and the ability to separate the problem domain into a finite number of dis- 
crete recognisable problems. Any other problem domain which showed these same 
features could use the current IFDIS with only a change in the database. This would 
include most gas turbine engine types as well as a wide range of mechanical, pneu- 
matic, hydraulic and electronic/electrical systems. Neither the causal nor the divisibil- 
ity criteria will apply to many computer-based systems since software problems have a 
different structure. 




Fig. 3 New Standard IFDIS User Environments - stand-alone (left) and elient-server (right). 



User and Maintainer Environments 

The user interaction with a system like IFDIS depends on two separate commodities: 
firstly the environment created by the software layers and secondly the user interface 
itself These will be discussed separately both for an end-user and for a person seeking 
to maintain the system. 







Using XML and Other Techniques 387 



A number of versions of the user environment are possible with this design. The 
standard user environment is shown in Fig. 3 above. The “XML Layer” comprises 
both rules from the database “data” and metarules (rules about rules) from the web 
“data” The database and web server portions can also be separately placed within a 
server connected to the user via a network. The right hand diagram of Fig. 3 shows 
such a server and one of a possible number of users. In both layouts, the shaded areas, 
comprising the two data stores (one in the database and the second the Active Server 
Pages) and the XML layer, are the only areas of IFDIS which are unique to a specific 
implementation of IFDIS; the rest of the environment comprises standard commer- 
cially-available software. 

The database could be maintained on either of the above two environments. How- 
ever, the database is fully self-contained so the minimum system for database mainte- 
nance can omit the Browser, Personal Web Server and webpage Data items. Software 
maintenance is achieved using a separate development environment which adds an- 
other Microsoft product. Visual InterDev (supplied as part of Visual Studio), to man- 
age development of the DHTML, HTML and XML code, and JavaScript and 
VBscript scripts. The maintainer will thus have separate windows giving separate 
access to the InterDev environment, the database and the browser. 

— ^“1 

^ JF-30 Engine I 




will nul Uqhri Hthgii 







1 raidtTidlt 1 






1 r SI j'i Unnj ^rtgrnu HhbU B I 


U 




HaaHkB D 


3 ifiLT »: - ‘ 


CnrrmJ D 


i Bk-ked Fnr.ir/'n JLi.rur. jSaiml Q Nn1 Datarmlnid 

S ttu^n. l-iinDra 0 [lElH[TT>lnBd 


D 


8 Pb Const urifls ^ C*nmK DHw»« 


(1 


^ i PttWHH Plapoaip. 


D 


a UC 4ci[midDli|]nQH 


D 


B B AT InopBctUn 


d 


ia »H<Wlti*cB<ta 


D 


11 Zatw l gansp k.ng 


0 


13 




13 




U 




15 




IE 








BSE 






HVCIifeKlk«Bd 



TlteMTC PWHl* EMC (MFTiii 

MId n HHl Ihmnigih iA» nw* 




!;c 



I qii* I 



nUHMlIr^ll 



WTMit n4scdtt| Afi. tfoH li r««iiNe h<ith«i 
IhEtTtm? 






iFartMt tku pgipql 

NoraiiJ 

Dofit kbdw 



Fig. 4 Troubleshooting Screen from New IFDIS 



User Interface 

The original 1988 IFDIS used a large number of overlapping windows to interface 
with the user. The new version collapses this primarily to a single re-configurable 
webpage. After logon and selection of the troubleshooting guide, a typical user win- 
dow looks like Fig. 4 above. This is divided into five areas: Selection, Troubleshoot, 



388 Graham Forsyth and John Delaney 



Fault Table, Description and Q&A. Apart from the initial logon screen and specific 
screens for saving cases, recalling previous cases and printing, all user interaction with 
IFDIS occurs through this screen. Other screens are used for system maintenance 
including rewriting the XML code from the database. 

Information Being Sought by a Field Trial 

Before this technology can become part of the standard tools used to troubleshoot 
problems with these engines, a deployable Production IFDIS needs to be built. A 
major aim of the Trial IFDIS is to define this Production IFDIS by a Statement Of 
Requirement (SOR). Some of the points needing clarification include:- 

• the roles of the RAAF Logistics Maintenance Squadron (LMS) and of DSTO in 
database maintenance; 

• procedures for flight-line use and for test-cell use of the Production IFDIS; 

• specific engine problem selection for flight-line use of the Production IFDIS; 

• any additional problems to be added to available selection for test-cell use of the 
Production IFDIS; 

• compatibility with Standard Operating Environment (SOE) used by computers at 
RAAF establishments; 

• ease of installation of the software; 

• proposals for support to the local user of the Production IFDIS (possibly via a 
local engineering contractor); 

• proposals for support to the SOE component of the Production IFDIS (possibly 
via either SOE or engineering contractors); 

• proposals for support to the software component within the Production IFDIS 
(possibly via the IFDIS contractor); and to the database component within the 
Production IFDIS (possibly via LMS and DSTO). 

The chosen process to perform these tasks involves deployment of a Trial IFDIS with 
each squadron and with each test cell (and maybe other sites at RAAF Base Amber- 
ley) for about one month to collect information. The collected information will then be 
used in the generation of the SOR. 



Conclusion 

A new version of a diagnostic system for a gas turbine engine has been designed and 
developed in which the majority of the software is commercially available and where 
the essential proprietary portions comprise a relatively small collection of computer 
code and an expertise database maintainable by the engine design authority. 

This approach should lead to diagnostic systems which are supportable across a 
number of platforms over several cycles of system upgrades. The technology is be- 
lieved to be applicable to all diagnostic domains which are separable into distinct 
problems and have defined causal links between fault and symptom. 

The authors wish to thank those, particularly Paul Marsden and Philip Joyce, who 
contributed to the work described in this paper. 




Using XML and Other Techniques 389 



References 

1. Georgeff, M. ”An Interactive Fault Diagnosis and Isolation System for the F404 - 
F/A-18 Engine”, ARL Contractor's Report 1983. 

2. Larkin, M.D.; Frith, D.A. and Finlay, A.S. "IFDIS - An Expert System for Diag- 
nosis of Failures in Jet Aircraft Engines”, DSTO Propulsion Technical Memo- 
randum 439, January 1987. 

3. Frith, D.A. "Role of Knowledge Engineering Tools in Making Decisions about 
Physical Systems”, Digital Equipment Computer Users Society (DECUS), Aus- 
tralia, August 1988. 

4. Frith, D.A. "Engine Diagnostics - An Application For Expert System Concepts”, 
Proceedings of 9th International Symposium on Air Breathing Engines, Greece, 
September 1989. 

5. Delaney, J "TF30 IFDIS User Manual”, Competitive Advantage Technology, 
1988. 

6. Forsyth, G.F. and Larkin, M.D. "Concept Demonstration of the Use of Interactive 
Fault Diagnosis and Isolation for TF30 Engines”, Proceedings of the Second In- 
ternational Conference on Industrial and Engineering Applications of Artificial 
Intelligence and Expert Systems, IEA/AIE-89, University of Tennessee Space In- 
stitute, June 1989. 

7. Forsyth, G.F. "An Intelligent Assistant for Fault Diagnosis of Jet Engines”, Pro- 
ceedings of the Digital Equipment Computer Users Society (DECUS), Switzer- 
land, June 1989. 

8. Lockett, R.D. "Commercial Development of ARL's IFDIS Software”, Engineer- 
ing Analysis Services P/L, August 1988. 

9. Forsyth, G.F. "An Intelligent Assistant for Technicians Finding Faults with Jet 
Engines”, Proceedings of the Digital Equipment Computer Users Society (DE- 
CUS), South Pacific, August 1989. 

10. Forsyth, G.F.; Larkin, M.D. and Wallace, G.A. "Verification of Heuristic Knowl- 
edge by Comparison with a Causal/Heuristic Model”, Proceedings of the Third 
International Conference on Industrial and Engineering Applications of Artificial 
Intelligence and Expert Systems, IEA/AIE-90, Charleston SC, June 1990. 

11. Forsyth, G.F. "Verification of Heuristic Knowledge in a Diagnostic Expert Sys- 
tem Using a Causal/Qualitative Model”, Proceedings of the Digital Equipment 
Computer Users Society (DECUS), South Pacific, August 1990. 

12. Wallace, G.A. "Ensuring the Integrity and Veracity of an Interactive Fault Diag- 
nosis and Isolation System for a Gas Turbine Engine”, ARL Propulsion TM471, 
August 1990. 

13. Forsyth, G. F. "Case Study: Aeronautical Research Laboratory - Verification of a 
Diagnostic System”, Proceedings of HR Knowledge Based Systems Conference, 
April 1992. 

14. Bos, B "XML in 10 Points”, Http://www. w3.org/XML/1999/XML-in- 10-points/, 
1999. 

15. Delaney, J “IFDIS 98 Design, Standards and Data Definitions”, eVision Pty 
Ltd., December 1998. 

16. Delaney, J “IFDIS 98 Final Report”, eVision Pty Ltd., June 1999. 




390 Graham Forsyth and John Delaney 



COPYRIGHT AND TRADEMARKS 

A number of terms used within this document are either trademarks, registered names or 
copyright items owned by other parties. This includes Access, Diatron, IFDIS, Internet Ex- 
plorer, Microsoft, Netscape, Windows, Visual InterDev, Visual Studio and XML. 




Learning and Diagnosis in Manufacturing 
Processes through an Executable Bayesian 

Network 



M.A. Rodrigues^, Y. Liu^, L. Bottaci^, and D.I. Rigas^ 

^ School of Computing & Management, Sheffield Hallam University 
Sheffield, SI IWB, UK 

^ Department of Computer Science, The University of Hull, 

Hull, HU6 7RX, UK 

Abstract. In this paper we present a novel approach to modelling a 
manufacturing process that allows one to learn about causal mechanisms 
of manufacturing defects through a Process Modelling and Executable 
Bayesian Network (PMEBN). The method combines probabilistic rea- 
soning with time dependent parameters which are of crucial interest to 
quality control in manufacturing environments. We demonstrate the con- 
cept through a case study of a caravan manufacturing line using inspec- 
tion data. 

1 Introduction 

This paper describes a method for on-line monitoring and diagnosis of manufac- 
turing defects in a labour intensive manufacturing environment. A case-study 
of a caravan production line is used where management requires on-line infor- 
mation such as frequency of defects, causal relationships, defect targets, work 
force skill levels, and so on. Our approach to the problem is based on Bayesian 
analysis. In the last two decades, Bayesian inference and Bayesian networks 
have been extensively used to simulate and learn about causal mechanisms that 
operate in the environment [1,2,3,4,5,9,16,17]. Bayesian networks are graphical 
models based on probability theory used to gain insights into system behaviour, 
or to forecast a system response to specific actions. Early Bayesian networks used 
message-passing and were limited to singly connected networks or trees [6,7]. Re- 
finements to the tree-propagation method have been proposed such as node elim- 
ination [14], clique-tree propagation and loop-cut conditioning [8], [15]. Our novel 
approach combines systems modelling, simulation, [10,11,12,13] and Bayesian in- 
ference together with graphical executable models into a single framework which, 
at the same time that it is tuned to manufacturing environments, it is also suf- 
ficiently generic to be applied to automated or non-automated production lines. 
Section 2 describes how manufacturing processes are represented and modelled 
within a Bayesian framework. Section 3 highlights experimental results and fi- 
nally in Section 4 some conclusions are drawn. 

2 Representation and Modelling of Manufacturing 
Processes 

We have devised a common representation framework that allows us to reason 
about aspects of manufacturing data and also to reason about the manufacturing 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 390—396, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



Learning and Diagnosis in Manufacturing Processes 391 



process itself as described in this section. All data and processes are hierarchi- 
cally represented as a set of data defined as: [ Unit Operation Component 
Location Fault Team ] . The assumptions built into such representation are 
highlighted as follows. Suppose that a given production batch (Unit) is divided 
into a number of operations (Operation). We wish to predict the probabilities 
of faults for each operation given our initial understanding of the manufacturing 
process which should then be updated as new sample information is available. In 
order to do so, we denote the operation variables by 0^ (i = 1, 2, • • • , n where n 
is the number of operations), and Oi as the state of the variable 0^, and R is 
a set of observations. The convention used here is that a variable is denoted by 
an upper-case letter while the state or value of the variable by the same letter 
in lower case. We use Bayes’ rule to obtain the probability distribution for 0^ 
given R and background knowledge /3: 

P{ei\(3) pm, (3) 



p{e,\R,p) = 



where 



P{R\(3) 

P{R\(3) = r p{R\ei,(3) P{ei\f3)dei 

Ji=l 



( 1 ) 



( 2 ) 



The term P{R\0i^ /3) is the likelihood function while P{0i\(3) is the marginal prob- 
ability for a given operation 0^. Both terms require estimation. The marginal 
probabilities are the prior probabilities P{0i\f3) and can be estimated through 
our belief or knowledge of the process. Posterior probabilities are then used to 
update our knowledge of the process and thus, our prior assessment. The likeli- 
hood function is more complex and can be estimated from observation of physical 
probability distributions. For instance, assuming a random set of variable X, 

p{x\0,p) = f{x,e) 



where /(x, 0) is the likelihood function with parameters some knowledge or 
assumptions are required to solve this problem. A solution can be found by 
assuming a finite number of parameters, then the variable under interest X 
whose X states maybe be continuous and have a Gaussian physical probability 
distribution with mean jjL and variance v\ 

P{x\e,l3) = ( 27 TZ^)-i/ 2 e-(^-M)V 2 i^ 

where 0 = {/i, However, for the case of manufacturing defects it is not always 
possible to make assumptions about the mean and variance of samples. It is more 
convenient to assume that events happen over a continuum, such as time, and 
are described by a Poisson distribution. In a Poisson distribution, the observed 
variable X has a number of occurrences r within a time interval t and A is defined 
as the intensity of the process. Thus, we can express the likelihood as: 

P{x\0i,P)=P{r%\) = ^ 

where f is the observed value of occurrences, and the mean and variance of the 
distribution are assumed as: 



E{r\XR) = At, V{r\XR) = At. 



392 M.A. Rodrigues et al. 



Thus, for such a time-dependent system the Bayes rule is expressed as: 



P(\i\x,t,(j) = 






(3) 



where independency of events and a constant (stationary) intensity A are as- 
sumed. For the Operation example, we thus assume that each operation is in- 
dependent and has its own local probability distribution. Obviously, the joint 
probability distribution is dependent on each constituent operation but it also 
implies that each operation is independent even when joint probabilities are 
known. 

The above form the basis for the Bayesian analysis of manufacturing processes 
through the representation framework [Unit Operation Component Location 
Fault Team] which has been implemented by a Process Modelling and Exe- 
cutable Bayesian Network (PMEBN) as follows. For a set of variables © = 
{01, • • • a Bayesian network structure N encodes independent assertions 

about the variables in 0 and a set of local probability distributions associated 
with each variable. Together, those components define the joint probability dis- 
tributions for 0. Using the example for Unit and Operation as in the previous 
section, if we use Oi to represent both the variable and its corresponding node 
for an operation, and Hi to denote the parent unit of node Oi in N ^ the joint 
probability distribution for 0 is given by 



p(0) = nmk) 



(4) 



i=l 



Given the semantics of causal relationships between the variables that can be 
readily asserted by production managers, these relationships normally corre- 
spond to assertions of conditional dependence. Thus, in order to build a Bayesian 
network, one would simply connect cause variables to their immediate effects. 
The next step would be to assess local probability distributions for the parent 
node P{0i\'Ki)^ and this process is repeated for Location and Team. 

The implemented PMEBN model, which doubles as a modelling and pro- 
gramming environment, is easily understood within the context of quality in- 
spection. The system building blocks (Unit, Operation, Component, Location, 
Fault, Team) are selected from a menu buttons and dragged into the modelling 
window and connected together according to their hierarchical definition. The 
hierarchy is strict, but it can be loosened in the sense that missing layers of 
building blocks are allowed. As soon as any building block is created, the system 
automatically declares predefined functions whose parameters are then provided 
through dialogue boxes. Once the model (interconnected picture) is built, it can 
be run in two distinct modes: (1) in Bayesian inference mode, in which posterior 
probabilities are evaluated providing insights into the process parameters under 
study and also providing facts on various parameters; (2) in diagnosis mode, in 
which data analysis is performed and probabilities for given faults are estimated 
based on past history and new sample information. An example of a realistic 
model developed with our industrial partner is described in the next section. 



Learning and Diagnosis in Manufacturing Processes 393 



3 Experimental Results 

We would like to stress that, in model building with our proposed method, only 
a number of simple steps and decisions are required: (i.) divide a manufacturing 
process into one or more operations (this obviously need not correspond to actual 
physical processes if a simulation of what-if scenario is desired); (ii.) assign 
assembly components for each operation; (Hi.) assign optional locations for each 
component; (iv.) select fault codes for which statistical analyses are required; 
(v.) assign component codes to teams for which statistical analyses are required. 




Fig. 1. Left, an incomplete model; right, a full model. 



We have modelled the manufacturing process of a particular model of a car- 
avan as defined by the production manager for one of the production lines. The 
model is depicted in Figure 1 above. The model and the modelling process high- 
lights important aspects of our method as follows. First, batches can be modelled 
independently and saved together with their respective inspection data. Second, 
a number of arbitrary operations can be defined for each process under inves- 
tigation. In the same way, any arbitrary combination of components, locations, 
and faults can be investigated. Third, if it is desired to focus on only a number 
of operations at a time, the model can be incomplete, as depicted in Figure 1 
left. This does not cause any run-time errors and the data are fully analysed. 

For experimental validation of the system, we have used a sub-set of data 
from real inspection sheets so that the problem is constrained to a manageable 
complexity. While the full set of component codes were used, the fault codes 
were categorised into 3 groups only. Similarly, teams were limited to 5 to match 
the number of operations (in reality, there are 17 teams in the production line). 
Similar simplification was also made for location, which only assumes 3 different 
values. Even with such simplifications, the number of possible combinations for 
operation/component /location/fault is very large. 

The plots depicted in Figure 2 are outputs of the model running in Bayesian 
inference mode. Since each caravan takes 15 minutes to be manufactured, delta 
time is thus set to dt = 15min. This means that each unit of time represents data 



394 



M.A. Rodrigues et al. 





Fig. 2. Running the model in Bayesian inference mode: left, simple facts such 
as number of faults per component. Right, posterior probabilities for assumed 
rate of faults per operation. 

for 4 caravans and that one full day of production equates to 32 units of time. 
On the left of Figure 2 above, simple statistics can be shown such as number of 
faults per component. On the right, a number of operations have been defined for 
the manufacturing process. Given background knowledge from the production 
and quality managers, expected rate of defects (A) and corresponding probabil- 
ities were defined for each operation. Such rates can be seen as manufacturing 
targets for a particular operation whose probabilities are updated with new sam- 
ple information. The curves show that only one of the given operations has a 
high probability of staying within its target for manufacturing defects and this 
represents vital information previously unavailable to the quality manager. In 
the same way, the model allows reasoning about levels of skills, as these can be 
defined in terms of desired rate of manufacturing defects with prior probabilities 
assessed by the production manager. The desired rate of defects may represent 
different levels of skills such as experienced, trained, and trainee. As new sam- 
ple information is available, probabilities are updated and this may indicate to 
the quality manager the need for training or for redesign a process. Because the 
model is defined as time based, its outputs also give a clear indication of the time 
of day when performance was at its peak or otherwise. In this way, patterns and 
trends can be learned and actions taken accordingly. 




Fig. 3. Performing diagnosis: left, major faults, centre: unattributed faults; right: 
probabilities for teams responsible for unattributed faults. 

Figure 3 depicts results in diagnosis mode. On the left, faults with high fre- 
quency are displayed. Middle, unattributed or unknown faults. These faults are 



Learning and Diagnosis in Manufacturing Processes 395 



thus subject to causal analysis based on past history. On the right, probabilities 
for teams as evaluated by the system for unattributed faults. In addition to diag- 
nosis, the system also performs interdependency analysis which is to determine 
which teams are introducing faults on components that have been assembled by 
other teams. This is only possible due to the adopted representation framework 
which allows the separate definition of components that are part of an operation 
and components that fall into a team’s responsibility which are not necessarily 
the same. 

4 Summary and Conclusion 

The advantages of the method implemented by the Process Modelling and Ex- 
ecutable Bayesian Network (PMEBN) are summarised as follows. (1.) Defects 
often display a time dependency which quality managers are keen to capture and 
understand. This is built into the model in a very intuitive way through intensity 
of process, Bayesian probabilities, and dynamical modelling; (2.) although joint 
probability distributions are obviously dependent on various manufacturing pa- 
rameters, these parameters are explicitly represented as independent and remain 
so. Quality managers can then learn their relative influence so that proper ac- 
tions can be taken; (3.) the causal information encoded into the PMEBN method 
helps the analysis of sequence information and their interdependencies, such as 
the effects of sequence of assembly operations on the attribution of defects to 
teams; (4.) the hierarchically based representation framework for manufactur- 
ing parameters as Unit Operation Component Location Fault Team proved 
to be a powerful tool for inference and diagnosis, even for incomplete models; (5.) 
the graphical model is executable, which means that when the picture model is 
completed, so is the programming. This makes it particularly attractive to qual- 
ity managers with no programming or advanced statistical knowledge, who can 
learn about the process and build what-if scenarios literally in a few minutes; (6.) 
similarly, the mapping between the topology of a PMEBN model and the manu- 
facturing environment allows easy reconfiguration of the network in response to 
changing conditions resulting in process learning by the manager and automatic 
diagnostic reasoning on novel situations by the network. 

Acknowledgments 

The work described in this paper is funded by the UK Engineering and Physical 
Sciences Research Council through grant EPSRC GR/L19508. 



References 

1. J. Bernardo and A. Smith (1994). Bayesian Theory^ John Wiley and Sons, New 
York. 390 

2. W. Buntine (1996). A Guide to the literature on learning graphical models. IEEE 
Transactions on Knowledge and Data Engineering^ 8:195-210. 390 



396 M.A. Rodrigues et al. 



3. D. Chickering and D. Heckerman (1996). Efficient approximation for the marginal 
likelihood of incomplete data given a Bayesian network. Technical Report MSR- 
TR-96-08, Microsoft Research, Redmond, WA. 390 

4. D. Heckerman (1996). A Tutorial on Learning with Bayesian Networks. Technical 
Report MSR-TR-95-06 , Microsoft Research, Redmond, WA. 390 

5. D. Heckerman and R Schachter (1995). Decision-Theoretic Foundations for Causal 
Reasoning, Journal of Artificial Intelligence Research, 3:405-430. 390 

6. J. H. Kim and J. Pearl (1983). A computational model for combined causal and 
diagnostic reasoning in inference systems. 3rd Int Conf KR&R, CA, 661-672. 390 

7. J. Pearl (1982). Reverend Bayes on inference engines: a distributed hierarchical 
approach. AAAI Nat Conf on AI , PA, 133-136. 390 

8. J. Pearl (1988). Probabilistic Reasoning in Intelligent Systems, Morgan Kaufmann, 
San Mateo, CA. 390 

9. J. Pearl (1994). Bayesian Networks, UCLA Technical Report R-216 . 390 

10. M. A. Rodrigues and Y. F. Li (1996). Dynamic Systems Modeling and Development 
of Robotic Applications in a Visual Programming Environment. Int J Robotics and 
Computer Integrated Manufacturing , 12(4):311-320. 390 

11. MA Rodrigues and L Bottaci (1998). Modelling and Simulation of Customer- 
Manufacturer Interactions in a Labour Intensive Manufacturing Environment. 17th 
Int Conf Modelling, Identification and Control, Grindelwald, 50-53. 390 

12. MA Rodrigues and L Bottaci (1998). An AI Modelling Approach to Understanding 
a Complex Manufacturing Process. Lecture Notes in AI If 15: Methodology and 
Tools in Knowledge- Based Systems, Springer- Verlag, 814-821. 390 

13. MA Rodrigues and L Bottaci (1998). Use of Simulation to Evaluate the Benefits 
of Timely Defect Information. lEE Int Conf on Simulation: Innovation Through 
Simulation, York, 235-239. 390 

14. R. D. Shachter (ed.) (1990). Special Issue on Influence Diagrams. Networks: An 
International Journal, 20(5). 390 

15. G. Shaffer and J. Pearl (ed.) (1990). Readings in uncertain reasoning, Morgan 
Kaufmann, San Mateo, CA. 390 

16. M. West and J. Harrison (1989). Bayesian Eorecasting and Dynamic Models, 
Springer- Verlag, New York. 390 

17. R. L. Winkler (1972). Introduction to Bayesian Inference and Decision, Holt, Rine- 
hart and Winston, Inc. 390 



Solving Large Configuration Problems Efficiently 
by Clustering the ConBaCon Model 



Ulrich John 

Research Institute for Computer Architecture and Software Technology 
GMD FIRST, Kekulestr. 7, D-12489 Berlin, Germany 
j ohn@f ir St . gmd . de 



Abstract. In this paper, we outline our constraint-based model for con- 
figuring and reconfiguring industrial products as well as some aspects of 
its prototypical implementation ConBaCon {Constraint- Based Configu- 
ration). ConBaCon overcomes the deficits of existing commercial configu- 
ration systems. Problems remain in the case of large products/ technical 
systems. As one way of tackling this, we present a clustering approach 
within the ConBaCon model that allows large configuration problems to 
be efficiently solved by substantially reducing the amount of constraint 
variables and constraints needed. 

Keywords: Constraint-based Configuration, Constraint Programming, 
Industrial Expert Systems, Constraint-based Modeling 



1 Introduction 

The computer-assisted development of industrial products is still under intensive 
research. The reasons for this are relatively long development times, the costs 
these entail, the resulting competitive disadvantages and the error-proneness 
(cf. [14,1] or [19]) of the development processes. 

Since 1981, with the introduction of the well-known rule-based configuration 
system XCON for configuring DEC computers, different approaches have been 
proposed and investigated for the knowledge-based configuration of products 
and technical systems. These include various rule-based, case-based, and recently 
more and more constraint-based approaches. Overviews of different approaches 
and systems are given in [19,18] and [16]. If research approaches as well as 
commercial systems^ are considered, the following general deficits are found: 
The problem specification is nondeclarative and hard to maintain. Often, the 
sequence of interactions during the configuration process is fixed. Thus, a flexible 
configuration process, as supported by ConBaCon^ is impossible. The simulation 
of different effects as the results of alternative interactive decisions is rare and 
the support of good reconfigurations, which is needed by industry, is insufficient 
or nonexistent. Furthermore, finding optimal or near-optimal configurations is 
impossible, and sometimes the underlying algorithms fail to terminate, etc. 

^ CeBIT’98 fair, Hanover/ Germany 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 396-406, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Solving Large Configuration Problems Efficiently by Clustering 397 



It is generally accepted that high-quality configuration systems can be re- 
alized, especially by using constraint programming (cf. [7,2,3,15]). Successful 
commercial configuration systems are always dubbed “constraint-based”, but 
this is usually misleading because such systems do not use integrated constraint 
solvers to reduce the search space; they merely process a constraint-based prob- 
lem specification by simple checking of constraints (cf. [1]). As regards genuine 
constraint-based configuration systems or research prototypes, it may be said 
that relevant publications on these approaches are often of a very general nature 
or exhibit limitations in terms of the quality of search-space reduction and the 
problem class they can handle. 

Our prototypical configuration system ConBaCon, based on the CLP lan- 
guage^ CHIP, attempts to overcome the above deficits. It was largely a product 
of the VERMEIL^ project, which was concerned with developing concepts to 
support the knowledge-based development of reliable control systems. Our in- 
dustrial project partner, ELPRO Prozessindustrie GmbH, a typical producer of 
energy-supply and process-control systems, served as a reference enterprise. 

The following section outlines the corresponding specification language Con- 
BaConL. Section 3 introduces some key aspects of our configuration model 
and its realization ConBaCon, which allow the configuration of industrial prod- 
ucts/technical systems. 

One way of tackling large (complex) configuration problems is by clustering 
the ConBaCon model. This idea is presented in Section 4. The paper closes with 
a conclusion and some remarks on possible future extensions. 



2 ConBaConL 

By analyzing the results of design problems for industrial control systems, we 
developed a formal problem model and, based on this, the largely declarative 
specification language ConBaConL^ which allows the specification of relevant 
configuration problems. These specifications are composed of three parts: object 
hierarchy^ context-independent constraints and context constraints. Every tech- 
nical object that can play a part in the configuration problem must be specified 
in terms of its structure in the object hierarchy. An object can consist of several 
components in the sense of the consist jo f-ieldution^ where components may be 
optional, or the object has some specializations in the sense of the zs_a-relation. 
In addition, all attributes of the technical objects are specified. If the attribute 
values of a technical object are explicitly known, they will be enumerated. 

A correct context-independent representation of the configuration problem 
is created from the object-hierarchy specification by adding the specification of 
the constraints concerning, on the one hand, different attribute value sets, and 
on the other, the existence or nonexistence of technical objects in the problem 

^ CLP=Constraint Logic Programming 

^ Funded by the German Federal Ministry for Education, Science, Research and Tech- 
nology 



398 Ulrich John 



solution. If context constraints exist (e.g., customer-specific demands or resource- 
oriented constraints), we have to specify them as problem-specific constraints in 
ConBaConL. The distinction between problem-specific and context-independent 
constraints is useful because the technical correctness of the problem solution is 
ensured if all context-independent constraints are fulfilled. 

The constraint elements of ConBaConL can be divided into Simple Con- 
straints^ Compositional Constraints and Conditional Constraints. Most of them 
are introduced below. 

2.1 Simple Constraints 

Attribute Value Constraints and Existence Constraints 

[o, Attr, VS]/not([o, Attr, VS]) = 

the attribute Attr of object o must/must not take a value from VS, 

exist (Ob jectlist ) / noexist(Objectlist) = 

all objects contained in Oh jectlist must /must not be part of the solution. 

Relational Constraints Between Attribute Value Sets Table Con- 
straints 

eq(Tl,T2), neq(Tl,T2), lt(Tl,T2), let(Tl,T2), gt(Tl,T2), get(Tl,T2). Further- 
more, it is possible to specify equations over attributes. 

In practice, coherences between solution parts are often specified in the form 
of tables (decision table). To avoid a manual, error-prone translation of the 
table into other kinds of ConBaConL constraints, a table constraint (see [9]) 
was introduced. 

2.2 Compositional Constraints 

Compositional Constraints are, besides the above-mentioned Simple Constraints, 
compositions of compositional constraints: and([Consi , . . . ,ConSn]), 
or([Consi , . . . ,ConSn]), xor([Consi , . . . ,ConSn]), 

atJeast([Consi , . . . ,ConSn]^N)/ at^most([Consi , . . . ,ConSn]^N)/ exact ([Consi, 

. . . ,ConSn]^N) = at_least/at_most/exactly N of the listed constraints are valid^. 

2.3 Conditional Constraints 

[if(Comp_Consi), then(Comp_Cons 2 )] {[iff(Comp_Consi) , then(Comp_Cons 2 )]) 
If (and only if) the compositional constraint Comp.Consi is fulfilled, the com- 
positional constraint Comp-Cons 2 must be fulfilled. 

Other important elements of ConBaConL are preferences, terms for doeumenting 
modifieations of produet taxonomies (see [10]) and optimization goals. 

A typical specification of ground rectifiers for large electric motors^ - together 
with the problem solution using ConBaCon - is outlined in [9] . 

^ So far, the processing of or-, xor, atJeast, at_most, exact-constraints concerning the 
existence and nonexistence demands of objects has been realized in ConBaCon. 

^ On the basis of data provided by our industrial partner. 



Solving Large Configuration Problems Efficiently by Clustering 399 



3 Modeling &; Implementation 

When transforming a problem specification, our goal is to obtain a problem- 
solution model that allows an efficient problem solution. The model should also 
support the option of high-quality interactions with the user. The model of a 
constraint-logic system over finite domains is taken as a basis for the solution 
model outlined below. Thus, the model can also be seen as a global constraint 
for structural configuration. 



3.1 Objects 

Each specified object (representing a technical module) that is not marked as ob- 
solete will be transformed into a module object of the problem-solution modeE. 
Moreover, each attribute of a specified object will be transformed into an at- 
tribute object^ i.e., a specified object with n attributes will be represented by 
n + 1 objects in the problem-solution model (Fig. 1). 




Fig. 1. Transformation of Objects 



In contrast to our approach, parameterized modules are only rarely consid- 
ered in the configuration-related literature (cf. [19]). 

Objects of the problem-solution space acquire certain model-specific attri- 
butes. The attribute component -list of object o contains identifiers of the object 
components {structure -typ = and-node) and of the specializations 

® Some constellations require the introduction of auxiliary module objects. These are 
not considered in the present paper. 



400 Ulrich John 



{structure -typ = or- node) of o, respectively. The constraint variable Ex-Var 
determines whether or not the object is contained in the solution. If the value 
of Ex-Var is zero, o is not part of the solution. If the value is one, o is part of 
the solution^. optJyp contains information about whether o is optional or not. 
Links to the corresponding attribute objects are given by attr -pointer -list. Each 
attribute object stores possible attribute values in value -children -lists and in 
the domain of a corresponding constraint variable. Moreover, identifiers of the 
value-related children-nodes are stored if the object o contains specializations. In 
this case, the attribute value sets of o are the set unification of the corresponding 
attribute value sets of the specialization objects. 

Besides the model objects, constraints are needed in the problem-solution 
model to ensure the coherences between the objects of the model so that the 
correctness of the solution and the completeness of the solution process are 
guaranteed with respect to the problem specification. These constraints we call 
consistency- ensuring constraints (CE constraints). 

3.2 CE Constraints 

Consistency- Ensuring Constraints are realized as logical coherences between val- 
ues of Eo^.Uar-attributes/attribute value sets of different attribute objects. The 
most important CE constraints are schematized in Fig. 2. 




Fig. 2. Consistency-Ensuring Constraints 



If it becomes obvious that an object cannot occur in the solution, it must be 
inferred that all components/specializations of it cannot occur in the solution 
(I). If it becomes obvious that an object is part of the solution {Ex-Var = 1), it 
must be ensured that all nonoptional components of the object are part of the 
solution, too (II). The existence of an object in a solution implies in each case the 

^ Restriction to the values zero and one is a simplification of the realized model. There 
are actually more values that reflect the existence of different technical identifiers of 
one technical object, these depending, for instance, on the fixed parameter values. 



Solving Large Configuration Problems Efficiently by Clustering 401 



existence of its parent object (III). Furthermore, if a nonoptional component of 
an object o cannot occur in any solution, the parent object o cannot occur in any 
solution either (IV). If the specialization of an object o is part of the solution, 
no other specialization of o can be part of the solution (V). If it becomes obvious 
that all specializations of an object o cannot occur in any solution, it must be 
inferred that o cannot occur in the solution either (VI). 

Attribute value sets are kept consistent by a special class of CE constraints. 
In the case that a value is deleted in the attribute value set of a specialization of 
an object o, the value has to be deleted in the corresponding attribute value set 
of o, except if there is another specialization of o that contains the deleted value 
in the corresponding attribute value set^. If an attribute value is deleted in an 
attribute value set of an object o possessing specializations, the same value has 
to be deleted in all corresponding attribute value sets of the specializations of o. 
If an attribute value set of an object o becomes empty, the nonexistence of o will 
be inferred by a special CE constraint. 

By integrating the introduced CE constraints in the problem-solution model, 
the structural coherences between objects of the solution model are ensured 
with respect to the existence, nonexistence and attribute value sets. Moreover, 
the constraints formulated in the problem specification must be transformed into 
constraints of the problem-solution model. 

3.3 Specified Constraints 

Attribute value constraints and existence constraints result in the deletion of 
attribute values in the problem-solution model or in the setting of Ex.Var- 
attributes. Relational constraints between attribute value sets result in the dele- 
tion of attribute values, which become invalid because of the specified relation. 
If there are other value tuples that do not fulfill the relation, some appropri- 
ate daemons have to be generated which control the relational constraints after 
each altering of the attribute value sets in question. Table constraints define con- 
nections between the attribute value sets in question and existence information 
(Ex-Var) on the objects listed in the table head. Altering the attribute value 
sets or existence values results in invalidity-marking of corresponding table lines. 
If all table lines are marked as invalid, the table constraint is not satisfied. Con- 
versely, it is ensured that the attribute value sets in question contain only values 
that are registered in valid table lines. Compositional Constraints are normally 
realized in the solution model by equations and inequations over correspond- 
ing ExA^ar-SittYihntes. For each nonexistence statement of an object, the term 
^A — ExA/^aE^ is used instead of Ex.Var in the equation/inequation. Conditional 
Constraints are transformed into conditional transitions of the problem-solution 
model, which ensure the specified logical coherences within the problem-solution 
model. In order to substantially reduce the problem space within the problem- 
solution model, the contrapositions of the specified conditional constraints are 
also transformed into elements of the problem-solution model. 

® To avoid intensive checking, the attribute valuejchildrenJist of the corresponding 
attribute object is checked and updated after each deletion of an attribute value. 



402 



Ulrich John 



3.4 Implementation 

Based on the outlined problem-solution model, a flexible and efficient problem- 
solution process was realized within the prototypical configuration system Con- 
BaCon using the CLP language CHIP. In particular, the object-based data man- 
agement and the existence of Conditional Propagation Rules^ in CHIP facilitated 
the implementation. 

The specified configuration problem is transformed into objects of the prob- 
lem-solution model. This means that the objects of the solution model are gen- 
erated, corresponding CE constraints are inferred and set, and the specified con- 
straints are transformed into corresponding constraints of the problem-solution 
model. The value one is assigned to the Ea;_Uar-attribute of the target object 
because the target object must exist in each solution. Thanks to the generated 
model with the model-specific CE constraints, a substantial reduction of the 
search space is guaranteed. We call the set of the currently active module ob- 
jects of the problem-solution model Configuration Space. Now, interactive user 
constraints can be given (one by one) relating to the existence or nonexistence 
of objects of the configuration space or to the shape of the corresponding at- 
tribute value sets. The user’s freedom to decide which object or attribute value 
set of the configuration space should be restricted by an interactively given user 
constraint is an outstanding feature compared with most other configuration 
models/tools. Governed by the constraints of the problem-solution model, this 
results in a new configuration space. Thus, a new cycle can start. Users can 
either give a new interactive constraint or they can delete previously given in- 
teractive user constraints. This allows the simulation of several user decisions, 
which is the requisite for a highly flexible configuration process. If no further 
interactive constraints are required, the generation of a solution can be started. 
This is done by labeling the Ea; War-attributes of the (still) active objects of 
the problem-solution model. Such labeling can be controlled by heuristics. This 
allows us to take into account preferences in the form of preference rules for 
controlling the labeling process. If the solution found is not suitable or fails to 
pass the solution quality check, further solutions can be created by backtracking. 
If a partial improvement of the solution suffices, a specific solution improvement 
can be started by specification and processing of a constraint hierarchy, i.e., the 
constraints that must be satisfied unconditionally will be specified as hard con- 
straints, and the solution parts that should, if possible, be in the new solution or 
desired attribute values will be fixed as weak constraints. The weak constraints 
can be marked with several weights. The specified constraint hierarchy will be 
processed in an error-minimization process, which results in the generation of a 
set of equivalent (hard) constraints of the problem-solution model. Information 
about the realization and application of constraint hierarchies in ConBaCon for 
partial improvement can be found in [17]. 

At second sight, it becomes obvious that the improvement process using 
a constraint-hierarchy transformer provides a sound basis for reconfiguration, 

^ Similar language elements exist in other CLP languages, e.g., Constraint-Handling 
Rules in ECLIPSE. 



Solving Large Configuration Problems Efficiently by Clustering 403 

which is needed by industry. Such a reconfiguration approach within the Con- 
BaCon model is presented in [10] and [11]. 

4 Clustering 

When applying the realized conhguration system ConBaCon to larger generated 
configuration problems^^, performance problems were observed as a result of the 
huge number of generated constraint variables and associated CE constraints 
within the problem-solution models. 

Besides model distribution and dynamizing, the integration of a proper clus- 
tering within the ConBaCon model is a way to overcome the performance prob- 
lems in the case of large configuration problems. Moreover, the memory demand 
for solving configuration problems with ConBaCon will be substantially reduced. 

The main idea is to look for identification of maximal object clusters within 
object-hierarchy specifications, where each cluster contains objects that are fully 
interdependent with regard to their existence in problem solutions. In other 
words, if and only if one object of a cluster is not included in the problem solution, 
each other cluster element is also not part of the solution. If there is such an 
identihcation mechanism, we are able to generate one existence variable Ex-Var 
per cluster within the generated problem-solution model instead of the existence 
variables of the single objects contained in the cluster. Instead of CE constraints 
between objects of the problem-solution model, similar CE constraints between 
the existence variables of related clusters have to be generated. This way, we will 
get a cluster-oriented problem-solution model that is semantically equivalent 
to the original problem-solution model. It contains quite a lot less constraint 
variables (existence variables) and associated CE constraints. 

We dehne object cluster Cluster with the root o (semiformal) as a set of 
objects from the object hierarchy where the following conditions hold: 

— o G Cluster^ 

— Wx G Cluster : o x ^ 3^^parent — child — chain" o ^ x^ 

— G Cluster— C[y G Cluster : opt-typ(x) = ^optionaV Ax ^ childrenset{y)^ 

— G Cluster : {~^3y G Cluster : y G childrenset{x) 

structure Jtyp{x) = ‘or — node‘ V childrenJist(x) = 0) 

Based on the definition of object clusters, we can define maximal object clusters. 
It can be shown that for each object hierarchy there exists exactly one decom- 
position into maximal object clusters. This decomposition can be processed by a 
rather simple top down algorithm. Owing to space limitations, it is not presented 
here. 

As motivated above, for each maximal cluster one existence variable will be 
generated. By analogy to the presented CE constraints (3.2) between object ex- 
istence variables, CE constraints of types I, II, IV and VI (cf. Fig. 2) between 
cluster existence variables will be generated. Instead of CE constraints of types 



10 



10,000 module objects and more in the object-hierarchy specification 



404 Ulrich John 



III and V, CE constraints of a new kind have to be generated. These constraints 
must ensure, for each maximal cluster and its elements with structure Jyp = 
‘or — node' ^ that if and only if the cluster exists in a solution, each ‘or-node’- 
element of the cluster has exactly one specialization that belongs to another 
cluster which is also part of the solution. The transformation of specified Con- 
BaConL constraints (3.3) must be complemented by a preprocessing step that 
replaces all object identifiers in “existence statements” of ConBaConL specifi- 
cations with the identifiers of the corresponding clusters. 

5 Conclusion / Future Work 

We have presented our constraint-based problem-solution model ConBaCon for 
the configuration of technical systems/industrial products. An idea of the com- 
plexity of the configuration problems that can be tackled by the solution model 
was given by describing the main elements of the corresponding specification lan- 
guage ConBaConL. So far, the prototypical realization of the presented problem- 
solution model ConBaCon, in the CLP language CHIP, has proved successful in 
the field of industrial control systems for the configuration of power-supply sys- 
tems for large electric motors. By substantially reducing the search space, the 
problem-solution model - together with the underlying CLP system - allows an 
efficient configuration process that can be flexibly controlled by user interactions. 
It is ensured that each solution found is correct with respect to the problem spec- 
ification and the underlying constraint solver. In addition, the completeness of 
the solution process is guaranteed. 

The integration of a constraint-hierarchy transformer allows the computation 
of improvement instructions and preferences and provides a sound basis for our 
reconfiguration approach (see [10,11]). 

When applying ConBaCon to generated larger problems (more than 10,000 
module objects in the object-hierarchy specification), we recognized the need for 
model improvements because of the performance problems encountered. Such 
an improvement - the clustering of the problem-solution model - was presented 
as an extension of the ConBaCon model. The clustering approach allows the 
efficient solution of large configuration problems and retains the merits of Con- 
BaCon. 

By integrating a graphical problem editor into ConBaCon, the system sup- 
ports innovative design processes, which is essential in many practical design 
problems (see [8]). Another important extension of the presented problem-solu- 
tion model is distribution, which is motivated by two concerns. On the one hand, 
it is useful, for the near-optimal solution of very large complex problems, to de- 
velop DPS-oriented approaches'^. More specifically, we have to develop proper 
problem-decomposition methods and models of corresponding agent systems. On 
the other hand, it is useful to support existing team structures in configuration 
related companies. Some work and initial ideas in this direction are documented 
in [6,13,12]. 



11 



DPS = Distributed Problem Solving 



Solving Large Configuration Problems Efficiently by Clustering 405 



Other interesting approaches for solving large configuration problems can be 
found in [15] and [4]. These provide ideas for dynamizing our extended problem- 
solution model. 



References 

1. T. Axling: Collaborative Interactive Design in Virtual Environments, www.sics.se/ 
^axling/Sdobelics.html. 396, 397 

2. T. Axling, S. Haridi: A Tool for Developing Interactive Configuration Applications. 
J. of Logic Progr., 1996. 397 

3. E. Celle, R. Weigel: Interactive Configuration using Constraint Satisfaction Tech- 
niques. Proc. of PACT’96. 397 

4. G. Eleischanderl, G. Eriedrich, A. Haselbock, M. Stumptner: Configuring Large 
Systems Using Generative Constraint Satisfaction. IEEE- Int. Systems 4/ 1998. 
405 

5. Eugene C. Ereuder: The Role of Configuration Knowledge in Business Process. 
IEEE- Int. Systems 4/ 1998. 

6. L. Gupta, J. E. Chionglo, Mark S. Eox: A Constraint Based Model of Coor- 
dination in Concurrent Design Projects. www.ie.utoronto.ca/EIL/DITL/WET- 
ICE96/Project Coordination/. 1996. 404 

7. P. van Hentenryck, V. Saraswat: Constraint Programming: Strategic Directions. J. 
of Constraints, 2/ 1997. 397 

8. U. John: Constraint-Based Design of Reliable Industrial Control Systems. In “Ad- 
vances in Systems, Signals, Control and Computers (V. Bajic, Ed.)”. lAAMSAD. 
Durban, South Africa. September 1998. 404 

9. U. John: Model and Implementation for Constraint-Based Configuration. Proc. of 
INAP’98. 398 

10. U. John, U. Geske: Reconfiguration of Technical Products Using ConBaCon. Proc. 
of WS on Configuration at AAAP99, Orlando, 1999. 398, 403, 404 

11. U. John, U. Geske: Solving Reconfiguration Tasks with ConBaCon. Proc. of the 
12th International Conference on Industrial Applications of Prolog (INAP’99). 
Tokyo, September 1999. 403, 404 

12. U. John, U. Jahnichen: Agenten-orientierte Konfiguration Industrieller Produkte. 
Proc. des Vorbereitungs-WS zum DEG-Schwerpunktprogramm “Intelligente Soft- 
wareagenten und betriebswirtschaftliche Anwendugsszenarien.’Tlmenau, Juli 1999. 
404 

13. Van Parunak et al: Distributed Component-Centered Design as Agent-Based 
Distributed Constraint Optimization. Proc. of WS Constraints and Agents on 
AAAF97. 404 

14. D. Sabin: www.cs.unh.edu/ccc/config. 1996. 396 

15. D. Sabin, E. C. Ereuder: Configuration as Composite Constraint Satisfaction. Proc. 
of AAAP96. 397, 405 

16. D. Sabin, R. Weigel: Product Configuration Frameworks - A Survey. IEEE Int. 
Systems 4/ 1998. 396 

17. A. Schiemann, U. John, U. Geske, D. Boulanger: Realization and Application of 
Constraint Hierarchies for Configuration of Technical Systems with ConBaCon (In 
German). Proc. of 12th Workshop Logic Programming (WLP’97). Munich 1997. 
402 



406 Ulrich John 



18. M. Stumptner: An Overview of Knowledge- Based Conf.. AI Communications. Vol. 
10 No. 2, 1997. 396 

19. J. Tiihonen, T. Soininen, T. Mannisto, R. Sulonen: State-of-the-Practice in Product 
Configuration - A Survey of 10 Cases in the Finnish Industry. In [20], 1996. 396, 
399 

20. T. Tomiyama, M. Mantyla, S. Finger (Eds.): Knowledge Intensive CAD. Capman 
& Hall, 1996. 406 



XProM: A Collaborative Knowledge-Based 
Project Management Tool 



Rattikom Hewett* and John Coffey^ 

’Florida Atlantic University Boca Raton, FL 33431 

hewett@cse.fau.edu (currently visiting at 2) 

^The Institute for Human & Machine Cognition 

The University of West Florida, Pensacola, FL 32514 
j cof f ey@ai . uwf . edu 

Abstract. There is a growing need for better project management to bring 
together people with diverse knowledge and skills to develop and implement 
project activities effectively and efficiently. This paper is a preliminary report 
on XProM (Expert Program Manager), a project management tool which assists 
users in project definition and control. XProM improves on most existing 
commercial tools by integrating groupware and expert system technologies into 
the concept map knowledge building tool developed at the Institute for Human 
and Machine Cognition. By employing groupware concepts, XProM facilitates 
communication, collaboration and cooperation among all involved parties in 
order to jointly identify project needs and requirements and to reduce the 
number of changes due to misunderstandings. XProM also includes a rule- 
based project control advisor which contains heuristic knowledge to guide users 
to possible corrective actions when things do not go as planned. The paper 
describes XProM's components and possible future directions for this work. 



1 Introduction 

A project is a one-time set of activities executed to accomplish a certain goal which is 
usually specified in terms of cost, schedule and performance requirements [9, 14]. 
Projects are not ongoing processes such as manufacturing products on an assembly 
line. Each project has a life cycle which encompasses several phases, has new and 
unfamiliar aspects, and is unique in that it will never be exactly repeated. Project 
management has become increasingly important for many professions including 
scientists and engineers whose jobs involve managing or participating in project 
activities. Project management involves planning, organizing, scheduling and 
controlling the resources required to finish a project as specified, on schedule, and 
within given cost and resource budgets. It integrates technology and business 
strategies and is concerned not only with technical aspects, such as techniques and 
procedures for planning, and budgeting, but also with behavioral and social factors 
such as organization, leadership and interpersonal skills. 

Today’s project management is more challenging than ever as technological 
complexity and rapidly changing, highly competitive markets result in uncontrollable, 
interacting forces which create greater risk and uncertainty. The traditional view of 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 406-413, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




XProM: A Collaborative Knowledge-Based Projeet Management Tool 407 



project management as downstream implementation, where organizational forms and 
management procedures rely on centralized decision-making and strict adherence to 
hierarchical authority, is no longer adequate. Most often project failure is due not to 
technical problems, uncontrollable forces, or the people involved but simply to bad 
project management [9]. The managerial causes of project failure include (1) failure 
to properly control the project (i.e., knowing what to watch for, when to act, and what 
to do) and (2) inadequate project definition [14]. Unclear project needs, poor work 
breakdown structures and lack of user involvement, in an early stage, in defining 
project requirements and scope are major contributors to the instability, conflict and 
numerous change requests which lead to rework and delay. Devoting more time and 
effort up front to adequately defining the project is known to be a high-payback 
investment. 

This paper presents an approach for a project management tool that helps 
alleviate both of the above problems. In particular, we propose XproM (Expert 
Program Manager), a collaborative, knowledge-based project management tool. 
XProM uses a graphical organizer based on concept maps [12, 13] as a language 
model for representing and organizing project requirements. Allowing collaboration 
among users and the project team, XProM enhances the identification of project needs 
and facilitates their explicit representation in an easily understandable form. 

Internet technology is beginning to revolutionize information distribution for 
project management tools. Commercially available project management tools vary 
greatly in their capabilities, but while most of these tools can be used for project 
planning, many cannot be used for project control [9, 14]. Most of the tools used for 
project control can monitor project performance and give warning when variances 
(deviations from planned cost, schedule, or performance) move outside tolerable 
limits. They help project managers take corrective actions by providing timely, 
accurate reports of variances and by projecting cost and schedule based on the current 
situation or altered scenarios. Although this helps managers identify and investigate 
opportunities to better manage the project, it still requires experienced managers to 
know what to watch for and what to do. In addition to such functionality, we propose 
a proactive approach to project control by including heuristic knowledge for guiding 
users to possible corrective actions when things do not go as planned. By encoding 
experiential knowledge about project control, XProM can assist not only novice 
managers but also experienced managers who are overwhelmed with complexity and 
information or simply pressed for time. XProM contains two main modules: (1) a 
collaborative project planner/organizer module, and (2) a rule-based project control 
advisor (which has not been implemented) designed to employ various standard 
methods for identifying critical tasks, estimating resource planning, and scheduling, 
and evaluating the progress of task performance [9, 14]. Section 2 describes the 
former module and Section 3 presents the latter along with examples of heuristic 
knowledge used for project control. Related work is discussed in Section 4, and the 
following section concludes with the advantages and future directions of this work. 




408 



Rattikorn Hewett and John Coffey 



2 The Collaborative Project Planner/Organizer 

The project planner/organizer of XProM is based on several extensions to a 
knowledge modeling tool, developed at the Institute for Human Machine and 
Cognition, called CMapTools [2, 7]. CMapTools provides mechanisms for users to 
construct, browse, edit and share knowledge bases in the well-known concept map 
representation introduced by Novak [12]. Concept maps are graphical representations 
consisting of nodes, representing concepts, and links, which represent relationships 
between concepts. Links can be directed or undirected. Concept maps are usually 
organized so that nodes at higher levels represent more general concepts than those at 
lower levels. Unlike directed graphs, concept map links can have labels. Concept 
maps are most useful for representing conceptual views explicitly in a form that can be 
understood easily. Thus, they tend to be less restricted than semantic nets in terms of 
how links and nodes are used. Concept maps have been used in corporate settings to 
help management understand corporate structure and to foster collaborative thinking 
about novel uses of existing products or potential new products [13]. CMapTools has 
been extended to a knowledge-sharing tool which facilitates collaboration by 
providing information exchange (based on propositions generated from concept maps 
constructed by group members) [15]. 

XProM extends CMapTools for use in project management, specifically for 
planning, organizing and tracking project activities. In the context of project 
planning, each concept represents an activity or a task in a project, and links between 
the concepts indicate ordering of activities in the project. In the context of project 
organization, for example, each concept may represent a project team member, while 
links may represent the hierarchy of management control. XProM provides 
functionality to create and update project plans. Figure 1 shows an example of a 
project plan created with XProM. Users plan the project, iteratively create activities 
and their sequences, and append information to the activity nodes in the graph 
structure. XProM can display activities in two modes: planning/organizing and 
explanation. The information associated with activity nodes may include start and end 
time, costs, and resources, as well as additional explanatory information such as text 
or spread sheets. XProM can be accessed either through the Internet or as a client- 
server application. A user can view the status of the project at anytime from any 
machine that can access XProM, and, with update permissions, can update project 
information. 

XProM shows the global context of a local view of a project using a simple 
information visualization technique [3]. A context window renders an overall global 
view of the project plan as shown in the upper left overlay of Figure 1. The 
surrounding rectangle in the global view corresponds to the portion of the graph 
visible in the local view or focus window. This rectangle can be dragged to scroll the 
focus window. The activity nodes are color coded to indicate progress of the project 
task. There are separate, configurable colors for completed activities, current 
activities, activities that are ready to be performed, and activities that cannot be started 
yet. When an activity is completed, the system changes its color to indicate its new 
status and changes subsequent activities to the ready status color. A project plan 
graph which shows paths of activities is useful for replanning project activities. The 




XProM: A Collaborative Knowledge-Based Projeet Management Tool 409 



planner editor facilitates the iterative process of creating plans, trying alternatives, 
archiving versions, referring to previous iterations, etc. 



ProjectPlan - T esterl - Local 



File Edit Views 




Display Status 

■ Completed ■ Current 



I Ready ■ Not Ready 



Architecture and Design 
Strategy Development 
Start Date: 12/15/2000 
End Date: 1/15/2001 



- 








Determine Number of Releases 
and Delivery Strategy 
Start Date: 1/15/2001 
End Date: 2/15/2001 


Develop Customer 
Support Plan 
Start Date: 2/15/2001 
End Date: 3/20/2001 






Comvlehd 




Develop SQA 
Plan 

Start Date: 3/20/2001 
End Date: 4/15/2001 
Not 



Plan Development 
Facilities 

Start Date: 4/15/2001 
End Date: 5/15/2001 
Not 




Develop Acceptance 
Test Plan 

Start Date: 2/15/2001 
End Date: 3/15/2001 



Ready 






Fig. 1. A partial project plan created with extended CmapTools in XProM. 

Collaborative tools can be characterized as asynchronous or synchronous [17]. 
XProM provides asynchronous collaboration by allowing managers to share their 
views and update project activities from different places and at different times. 
Asynchronous collaboration occurs, for example, when one manager creates or 
modifies a plan that another manager reviews at a later time. Synchronous project 
management collaboration occurs, for example, when different functional managers 
work simultaneously on the same plan. Distributed synchronous collaboration can be 
facilitated by various tools including QuickCamTM [10] and NetMeetingTM [11]. 
XProM operates in conjunction with QuickCam to provide audio, and video 
collaboration, while NetMeeting allows users to open shared windows of project 
views along with individual windows to other views. 




410 Rattikorn Hewett and John Coffey 



3 The Project Control Advisor 



XProM's Project Control Advisor is a rule-based expert system which can be 
developed using the Java Expert System Shell (JESS) [6]. JESS, an offshoot of a 
NASA rule-based shell called CLIPS [16], was specifically developed to support rule- 
based expert system technology via the Internet. Like CMapTools, which was 
implemented in JAVA, JESS can easily be integrated with CMapTools and the 
collaborative project planner/organizer module. JESS provides a graphical rule editor 
for constructing and maintaining knowledge bases and offers an inference engine and 
reasoning mechanisms. 

The Project Control Advisor can be executed periodically as a background 
process behind the project manager. The Project Control Advisor can be set by the 
project manager to execute under either of two conditions. It may be launched 
routinely by a “cron” process at time intervals established by the project manager. It 
may also be executed when activities are completed. The completion of an activity 
presents an opportunity to evaluate possible reallocation of project resources. 

Controlling a project involves monitoring project performance (e.g., by 
evaluating task progress to date), detecting problems (e.g., noticing that the 
implementation of the project plan is not progressing according to the baseline 
schedule), and identifying solutions for problems to get the project back on track [8, 9, 
14]. This requires an effective system to support collaboration, communication and 
distribution of information between upper management, functional managers and 
project managers. 

Most advanced project management systems provide support and functionality for 
tracking and for alerting users of problems. Using cost, schedule, and work progress 
variance information along with concepts of earned value [9], project progress can be 
evaluated and assessed at any time. XProM’s approach also includes expert 
knowledge that can be used to guide users in identifying appropriate corrective 
actions. Ideally, XProM simulates an expert manager by guiding users through three 
stages: (1) detecting a problem and investigating its causes, (2) suggesting possible 
corrective actions, and (3) predicting consequences and tradeoffs of taking corrective 
action or allowing the problem to persist. The analysis of the last stage provides 
decision support to the project manager. Types of corrective actions include addition 
of resources, change of project scope, change of schedule, or doing nothing and 
accepting the consequences. It is important that XProM also guides users in 
identifying the impact of possible corrective actions (e.g., using “what-if ’ queries). 



3.1 Knowledge Bases for Assisting Project Control 

As an example of the knowledge that can be used in XProM's Project Control 
Advisor, consider a scenario in which XProM detects a problem such as a delay in the 
completion of a critical task, over allocation of resources, or a cost overrun. XProM 
then helps users to determine the cause of the problem by suggesting relevant factors 
which should be checked. For example, XProM contains heuristic knowledge such as: 




XProM: A Collaborative Knowledge-Based Projeet Management Tool 411 



• Failure to start a task on time could be caused by behind-schedule completion of 
predecessor tasks or by unavailability of resources or materials. 

• Under performance of resources could be caused by conflicting assignments (for 
human resources), insufficient staffing, absences, or unavailability of materials. 

• Slower progress than planned could be caused by a delay in starting a task, the 
speed at which resources are working, or unavailability of materials. 

Understanding the cause of a problem helps managers make decisions about 
corrective actions. For example, the delay in a certain activity might be due to illness 
of team personnel. This is likely to be a one-time occurrence that will not have 
additional negative impacts in the future and can be compensated for. In such a case, 
XProM may recommend that the manager take no action other than possibly adjusting 
details of the project schedule. In cases where the cause of delay suggests additional 
future delays and critical tasks may need to be accelerated to meet time constraints, 
XProM may suggest that the manager assign overtime work, hire additional staff, add 
shifts, use subcontractors, train existing resources to do other tasks, or shift resources 
from other projects or from non-critical tasks. XproM then guides the manager to 
analyze the tradeoffs between these actions. The consequences of these action may 
not be cost-effective (e.g., excessive overtime may reduce resources’ performance), 
and XproM may also recommend rescheduling by making tasks concurrent, if 
possible, or changing the project’s scope by eliminating or simplifying tasks, or 
shortening critical tasks by changing scopes of future tasks. 



4 Related Work 

There are many commercially available project management products ranging from 
low-end tools to sophisticated systems for multi-project management. Most of these 
systems offer features such as network scheduling, resource management, budgeting, 
cost control and performance analysis. They vary considerably in capabilities, 
flexibility, degree of integration, ease of use, and interfaces. The low-end systems 
(e.g.. Milestone Simplicity, Project Vision, Quick Gantt) assist in basic capabilities 
such as laying out project plans, preparing reports and producing Gantt and Pert 
Charts [1]. The mid-range project management systems (e.g., Microsoft Project, 
Micro-Planner Manager and Suretrak of Primavera) provide a wide range of features 
for planning, scheduling, budgeting, resource allocation, developing Gant charts or 
PERT/CPM analysis, and tracking [1, 8, 9]. Most of the more sophisticated systems 
are more expensive and have a substantial learning curve. These high-end systems 
(e.g., Primavera Project Planner, Artemis' Views, Open Plan, Enterprise PM, Micro 
Planner X-Pert, Qwicknet Professional, Harvard Total Project Manager, Project 
Scheduler Network) provide integrated planning, scheduling, costing, control, and 
analytical reporting [1]. 

Artemis [18] is one of the major advanced systems that offers a comprehensive 
set of tools including features that assist project control such as earned value 
management reporting, and active alert for tracking project performances. In terms of 
collaboration, Artemis' WebGlobalView provides executives and managers across the 
organization with a web-based project analysis and reporting application. However, 




412 Rattikorn Hewett and John Coffey 



this tool only allows sharing access to information and does not provide mechanisms 
to coordinate or to gather ideas as typical collaborative tools do. 
POWERTOOL.com [20] is an on-line project collaboration tool for use by the design 
and construction industry to communicate, collaborate and disseminate project 
information to teams and parties involved via the Internet. It does not include support 
for managing projects. Knowledge-based information systems have been developed 
for engineering project management [3, 5, 19] and training. These systems do not 
specifically focus on project control issues. Artenis' & Software Productivity 
Research's KnowledgePLAN is one of a very few commercial project management 
systems that exploits expert system technology to guide users through the 
development of software project estimates. KnowledgePLAN provides knowledge- 
based estimation, scheduling functionality, and "what-if analysis to explore 
alternatives for resource allocation. While KnowledgePLAN uses knowledge for 
software project estimation, XProM uses knowledge to guide users through 
development of possible corrective actions during project control in any domain 
application. 



5 Conclusion 

We present an approach for development of XProM, a project management tool that 
improves on most existing commercial tools in three ways, first, by using a concept 
map-based graphical tool, XProM allows documents from the project definition 
process to be expressed clearly and explicitly in an easily understandable form. This 
reduces misunderstandings and unclear descriptions of project needs. Second, by 
integrating CMapTools with software and hardware that facilitates collaborations 
among managers, at various levels, and project teams, XProM enhances capability for 
gathering, updating and sharing information. It helps alleviate problems due to 
changes in requirements, one of the major causes of project failure, finally, by 
encoding experiential knowledge about project control, XProM can assist not only 
novice managers but also experienced managers overwhelmed by project complexity, 
large amounts of information and time constraints. Most sophisticated commercially 
available project management tools concentrate on providing information to users and 
are difficult to use. XProM exploits not only management methods and technology 
but also the use of human expertise in managing projects. Acquiring and fine-tuning 
the knowledge required for project management, including project control and 
estimation of resources, are parts of remaining challenges of this work. 

References 

1. A Buyer's Guide to Selecting Project Management Software, Project Manager’s 
Control Tower, http://www.4pm.com/articles/selpmsw.html 

2. Cahas, A.J., K. Lord, J. Brennan, T Reichherzer, Knowledge Construction and 
Sharing in Quorum. In Proceedings of AI in Education, Washington D.C., 1995, 
pp. 218-225. 




XProM: A Collaborative Knowledge-Based Projeet Management Tool 413 



3. Card, S, J. Mackinley and B. Schneiderman, Focus and Context. In Readings in 
Information Visualization: Using Vision to Thin, S. Card, J. MacKinlay and B 
Schneidermann (eds.). Morgan Kaufmann Publications. San Francisco, CA, 1999. 

4. Conlin, J. and A. Retik, The applicability of project management software and 
advanced IT techniques in construction delays mitigation. International Journal 
of Project Management 15 (2), 1997, pp. 107-120. 

5. Conroy, G. and H. Soltan, ConSERV, a methodology for managing multi- 
disciplinary engineering design projects. International Journal of Project 
Management 15 (2), 1997, pp. 121-132. 

6. Friedman-Hill, E., Jess, the Java Expert System Shell, http://herzberg.ca. 
sandia.gov/jess/, 1997. 

7. Ford, K.M., A.J. Cahas, J. Coffey, Participatory Explanation. In Proceedings of 
the Sixth Florida AI Research Symposium (FLAIRS ’93), Ft. Lauderdale, FL, 
April, 1993. 

8. Lowery, G. Managing Projects with Microsoft Project for Windows. Van 
NostrandReinhold, New York, 1990. 

9. Nicholas, J. Managing Business & Engineering Projects, Prentice Hall 
Englewood Cliffs, New Jersey, 1990. 

10. Logitech Announces the QuickCam. http://www.quickcam.com/, 1999. 

11. NetMeeting, http://www.microsoft.com/windows/NetMeeting/default.ASP, 1999. 

12. Novak, J.D. and Gowin, D.B, Learning How to Learn. Ithaca, NY: Cornell 
University Press, 1984. 

13. Novak, J.D, Learning, Creating and Using Knowledge: Concept Maps as 
Facilitative Tools in Schools and Corporations. Lawrence Earlbaum and 
Associates, Mahwah, N.J., 1998. 

14. Pinto, J., Editor. Project Management Handbook. Jossey-Bass Publishers, San 
Francisco, CA., 1998. 

15. Reichherzer, T., A. Cahas, K. Ford, and P. Hayes, The Giant: A Classroom 
Collaborator. In Proceedings of the Florida Artificial Intelligence Research 
Symposium (FLAIRS' 98), Sanibel Island, Florida, 1998, pp. 136-140. 

16. Riley, G, CLIPS, A Tool for Building Expert Systems, http://www.ghg. 
net/clips/CLIPS.html, 1997. 

17. Ellis, C., Gibbs, S. and Rein, G., Groupware, C^CM 34(l):38-58, Jan, 1991. 

18. The Changing Role of Project Management in IS/IT Organization, Artemis 
Management Systems, http://38.194.75.218/ISITwhitepaper.htm 

19. W, W., KIM Hawwash, JG Perry, Contract type selector (CTS): a KBS for 
training young engineers. International Journal of Project Management 14 (2), 
1996, pp. 95-102. 

20. Advanced Construction Technology, http://www.thepowertool.com. 




Building Logistics Networks Using Model-Based 
Reasoning Techniques 



Robbie Nakatsu and Izak Benbasat 

The University of British Columbia 
2053 Main Mall, Vaneouver, Canada V6T 1Z2 
{nakatsu, izak}@unixg . ubc . ca 



Abstract. This paper deseribes an intelligent system, LogNet, used to solve 
problems in logisties deeision-making. LogNet offers design guidanee by 
utilizing model-based reasoning teehniques. An end-user using LogNet ean test 
and refine a logisties network design and iteratively request help and adviee by 
the system. 



LogNet is an intelligent system that provides interactive advice to end-users on how 
to design cost-effective logistics networks. It implements its capabilities by utilizing 
a class of AI techniques known as model-based reasoning. These techniques solve 
problems by analyzing the structure and function of a system, as described by a 
symbolic model [1]. Unlike rule-based expert systems, which reason from “canned” 
rule-based associations (IF-THEN rules) to offer intelligent advice, systems 
employing model-based reasoning contain a model simulating the structure and 
function of some system. In LogNet, we model business logistics networks and 
reason about their structure in order to offer end-users advice on how to design these 
networks more effectively. 

Much of the previous research in model-based reasoning has focused on fault 
diagnosis and the troubleshooting of physical devices, such as the research work of 
Davis and Hamscher [2] in which they looked at simple electronic circuits. We have 
employed model-based reasoning techniques in a novel way to solve a class of 
problems in business logistics management. The coverage of this domain is very 
broad, so LogNet looks only at a small subset of problems in this area; namely, it 
focuses on the warehousing and distribution aspects of logistics planning: How many 
warehouses are needed in a logistics network, where should they be located, and how 
should they service customer demand? 



1. The Domain Object Model 

At the heart of LogNet is the domain object model, or the network configuration 
model. One way to view and model a logistics environment is as a network of nodes 
interconnected by transportation links. The problem of specifying the model would 
be one of specifying the network structure through which manufactured goods flow. 
To model this environment, three general types of nodes are considered: first, the 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 414-419, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




Building Logistics Networks Using Model-Based Reasoning Teehniques 415 



factories, where the products are manufactured; second, the warehouses, which 
receive the finished products from the factories for storage and possibly for further 
processing; and third, the customer zones (or markets), which place orders and 
receive the desired products from the assigned warehouse(s). Product moves through 
the logistics network via different transportation options (e.g., rail, trucking, shipping, 
air), which are represented by the connections or links between the nodes. There are 
two types of transportation links; inbound links move products from factories to 
warehouses, and outbound links move products from warehouses to customers. See 
Figure 1 below for an example of a logistics network model created using LogNet. 
Squares represent factories, circles represent warehouses, and triangles represent 
customer zones. 




Input Parameters! Dallas 

Fig. 1. A Network Configuration Model of a Logisties Environment Using LogNet 

LogNet enables end-users to build, directly manipulate, and inspect components of 
the network model. For instance, different network configurations can be created, and 
such performance measures as cost and customer service levels can be obtained. In 
addition, end-users may click on components of the network to modify their attribute 
value(s), and test these modified values against the network benchmarks. 

Spatial reasoning on the network model will be possible to the extent that LogNet 
will contain information concerning distances between all nodes. See Table 1 for a 
sample of a five-node system of selected U.S. cities. Distances are road miles 
between two cities. Whenever an end-user connects two nodes, the system will 
automatically update the distance-between-nodes attribute for that transportation link. 





Atlanta 


Chicago 


Dallas 


Denver 


Seattle 


Atlanta 




674 


795 


1398 


2618 


Chicago 


674 




917 


996 


2013 


Dallas 


795 


917 




781 


2078 


Denver 


1398 


996 


781 




1307 


Seattle 


2618 


2013 


2078 


1307 





Table 1: Road Mileage Between Seleeted U.S. Cities 



416 Robbie Nakatsu and Izak Benbasat 



2. Assumptions and Benchmarks of LogNet 



The system requires the calculation of benchmarks in order to evaluate different 
network models, and provide intelligent advice to the end-user to aid in the 
development of a suitable network model. Two main benchmarks are considered: 
total cost of the network model and customer service levels. 

Network Costs. The total cost of a given network configuration will consist of three 
components, transportation costs, warehousing costs, and inventory carrying costs. 

Transportation costs are dependent on the volume transported between the two 
nodes and the distance between the nodes. There are two types of transportation 
costs: inbound (factory to warehouse) and outbound (warehouse to customer zone). 
Transportation costs, for a given transportation link, are given by the following 
function: 

inbound rate (or outbound rate) * no. of units in lOOO’s * distance between nodes 

Inbound and outbound rates are input parameters that can be set by the end-user. 
They are expressed as the dollar rate per 1000 units per mile. 

Warehousing costs will be a combination of fixed costs (a cost that is set by the 
end-user) and variable costs (e.g., handling costs per unit). Warehousing costs for a 
single warehouse, are obtained by the following function: 

fixed costs + (no units stored * variable handling rate) 

Fixed costs and variable handling rates are input parameters that can be set by the 
end-user. 



Inventory carrying costs will depend on the average inventory maintained at a 
given warehouse (in-transit inventories will not be considered). In this discussion, we 
will consider only the cost of the money tied up in inventory, or the capital cost of 
inventory. Other inventory costs such as stock-out costs, inventory service costs, and 
risk are not considered. In LogNet, inventory carrying costs are estimated, based on 
the square root rulel This rule can be used to estimate the cost of inventory when 
consolidating a number of stocking points, n. It states the following: 



Consolidated Inventory = Decentralized Inventory / 



T 



n 



For example, if previously a network utilized three stocking points of $500,000 each 
for a total inventory investment of $1,500,000, after consolidation the inventory 
investment is given by: 



$1,500,000/ 




$866,025. 



^ The square root rule simplifies inventory eonsolidation, but is a useful heuristie nevertheless. 
More aeeurate funetions may be found to estimate eonsolidated inventories [3]. 




Building Logistics Networks Using Model-Based Reasoning Teehniques 4 17 

Inventory carrying costs, based on the square root rule, are decreased by 1,500,000 - 
866,025 = $633,975. Hence, the unit cost of inventory decreases when the number of 
warehouses in a logistics network is reduced. This is because the more consolidated 
the logistics network becomes (i.e., the fewer warehouses there are), the greater the 
inventory per warehouse. Because of this consolidation, the unit cost decreases due to 
economies of scale. In a similar fashion, LogNet calculates inventory based on a 
square root function to capture this cost dynamic. 

Customer Service Levels. Customer service level is another complex and multi- 
faceted benchmark, and again, many simplifying assumptions will be made. For this 
analysis, we will assume that the logistics decision maker would like to locate a 
warehouse stocking point as close to customers as possible to improve customer 
service. Hence, we may specify customer service levels in terms of how close the 
customer zones are to the warehouses. LogNet considers the average distance 
between customer zones and warehouses as one benchmark of customer service. 



3. Model-Based Heuristics 

How does LogNet offer intelligent advice to logisticians designing these networks? 
The approach we take is to look at the heuristics that experts use to solve logistics 
management problems. Heuristics are defined as “a short cut process of 
reasoning... that searches for a satisfactory, rather than an optimal solution. The 
heuristic, which reduces the time spent in the search for the solution of the problem, 
comprises a rule or a computational procedure which restricts the number of 
alternative solutions to a problem” [4] . 

To illustrate heuristics utilizing model-based reasoning, the following discussion 
will describe the model-based heuristics that LogNet utilizes to offer the end-user 
advice. We limit the discussion to three kinds of heuristics: 1) distance checking; 2) 
consolidation; and 3) decentralization. 

Distance Checking. The system can check (by request) whether the current network 
model is specified such that all transportation links are the shortest distance possible. 
For example, the user may want to check all inbound connections (factory-to- 
warehouse links). The inbound check (the outbound check is similar) will perform 
the following model-based reasoning on the network model: 

1) Scan the network model for each warehouse connected to a factory. 

2) Obtain the distance of that connection. 

3) Scan the network model and determine whether there is another factory that is 
closer to the warehouse. 

4) If the capacity of the factory can accommodate the warehouse’s demand, display 
a message to the end-user, conveying this information about a closer inbound factory- 
to-warehouse connection. 



Consolidation of the Distribution System. It may be the case that the current 
distribution network is too decentralized (i.e., the current network contains too many 




418 Robbie Nakatsu and Izak Benbasat 



warehouses), leading to excessive inventory carrying costs, and larger warehousing 
costs (i.e., warehouse fixed costs). In this case, the logistician may want to eliminate 
a warehouse so that all its customers are reassigned to another warehouse. 
Consolidation, on the one hand, will lead to increased outbound transportation costs 
(since the customers, formerly assigned to the eliminated warehouse, are now located 
farther away). On the other hand, inventory carrying costs are decreased since 
consolidation means that economies of scale are gained as several locations’ 
inventories are combined into one. In addition, warehouse fixed costs are decreased 
with the elimination of each warehouse. The model-based reasoning procedure would 
include the following four steps to assess these costs: 

1) Scan the network model for each active warehouse (i.e., one with customer 
zones currently assigned to it, or connected to it) 

2) Merge this warehouse with another active warehouse 

3) Assign (or connect) all the customer zones currently assigned to both 
warehouses to the newly merged warehouse. 

4) Assess the new costs (transportation, inventory, and warehousing costs) for the 
proposed consolidation 

LogNet will consider all possible consolidation opportunities between two 
warehouses. For example, if there are three active warehouses in the network model, 
Wl, W2, and W3, the following consolidations would be considered: 

W1 into W2 Wl into W3 

W2 into Wl W2 into W3 

W3 into Wl W3 into W2 

A total of 6 possible mergers would be considered, one at a time. For example, for 
the consolidation Wl into W2, all customers assigned assigned to Wl would be 
moved to W2, and Wl would be shut down. In general if there are n active 
warehouses, n * (n-1) consolidations are considered. 

One rule to aid in the consolidation decision would assess the cost trade-off 
between increased transportation costs vs. decreased inventory carrying costs and 
warehousing fixed costs: 

If the cost savings (inventory carrying costs + warehousing fixed costs) is greater 
than the increase in transportation costs resulting from the consolidation of two or 
more stocking points 

then suggest the consolidation as a candidate to the end-user 

There are two ways that LogNet will suggest a particular consolidation. The first 
strategy will suggest the consolidation that results in the largest cost savings. 
However, the end-user may be concerned with finding a consolidation that does not 
degrade customer service levels too much. Hence, the second strategy will suggest 
the consolidation that still offers a network cost savings, but does so with the least 
damage to customer service levels. The end-user can choose which of the two 
strategies he wishes to pursue in seeking out a consolidation opportunity. Fine-tuning 
the network design may entail switching back and forth between the two strategies. 




Building Logistics Networks Using Model-Based Reasoning Teehniques 4 19 



Decentralization of the Distribution System. The flip side to the consolidation 
decision, of course, is that it may be cost-effective to add another warehouse to the 
network. Adding another warehouse is beneficial to the extent that the transportation 
cost savings resulting from an additional stocking point exceeds the increase in 
inventory carrying costs and the additional warehousing costs (as discussed above). 
Hence, the complement to the consolidation rule given above is: 

If the transportation cost savings is greater than the increase in inventory carrying 
costs and warehousing costs resulting from the addition of a warehouse stocking 
point 

then suggest the decentralization candidate to the end-user 

The model-based reasoning procedure would scan the entire network model and 
check every possible proposed warehouse location to determine which one would add 
the greatest net savings (transportation cost savings minus additional costs (additional 
inventory carrying costs + additional warehousing fixed costs) ): 

1) Scan the network model and locate a new warehouse site to be added 

2) Assign all the customer zones that are closest to this new warehouse site 

3) Assess the new costs (transportation, inventory, and warehousing costs) for the 
proposed new site. 

If the customer service levels are not being met, an additional stocking point 
should still be added, even if the transportation costs savings do not exceed the 
additional costs resulting from the addition of the warehouse. But which stocking 
point should be added to the network? Again, LogNet will consider one of two 
strategies. One strategy would suggest the stocking point that improves customer 
service level (i.e., average distance between customers and warehouses), regardless of 
cost. The second strategy would suggest adding the warehouse that adds the lowest 
additional cost (but still improves customer service). The end-user is free to choose 
which of the two strategies he would like to see employed by LogNet. 



References 

1. Kunz, J.C.: Model Based Reasoning in CIM. In Intelligent Manufacturing. 
Proceedings from the First International Conference on Expert Systems and the 
Leading Edge in Production Planning and Control, (1987) 93-1 12. 

2. Davis, R. and Hamscher, W.: Model-based Reasoning: Troubleshooting. In 

Shrobe, H. (Ed.), Exploring Artificial Intelligence. San Mateo: Morgan 

Kaufmann Publishers, (1988) 297-346. 

3. Ballou, R.H.: Business Logistics Management, Third Edition. Englewood 

Cliffs, NJ: Prentice-Hall (1992). 

4. Hinkle, C.L. and Kuehn, A. A.: Heuristic Models: Mapping the Maze for 

Management. California Management Review, 10, (1967) 59-68. 




A Supporting System for Colored Knitting 

Design 



Daisuke Suzuki^, Tsuyoshi Miyazaki^, Koji Yamada^, Tsuyoshi Nakamura^, 

and Hidenori Itoh^ 



^ Department of ICS, Nagoya Institute of Technology 
Gokiso-cho, Showa-ku, Nagoya 466-8555, Japan 
{daisuke ,ykoji jtnaka, itohjOics .nitech. ac . jp 
^ Sugiyama Jogakuen University 



Abstract. Knitting needs some artistic sense and technical knowledge. 
We had already proposed the supporting system which changes a simple 
design figure which knitting designers construct into a pattern-knitting 
diagram by changing rules. Generally you use various colored strings to 
knit. However, the system had dealt with only monochromatic knitting 
patterns by now, and a kind of system representing colored knitting 
patterns could support many knitting designers. 

Therefore in this paper, a method generating colored images of complete 
knitting pattern is described. 



1 Introduction 

Knitting is a kind of industrial arts. Generally knitting is generated by using 
a pattern-knitting diagram. A pattern-knitting diagram consists of several ba- 
sic knitting stitches which are local twine of string and standardized by JIS 
(Japanese Industrial Standard) [7]. There are few textbooks which describe how 
to construct pattern-knitting diagrams. In addition to that, constructing the 
pattern-knitting diagram needs some artistic sense and technical knowledge of 
knitting, and it is difficult for knitting design beginners to imagine a complete 
knitting pattern viewing from a pattern-knitting diagram. Considering this prob- 
lem, we had already proposed the following system. This system changes a simple 
design figure which knitting designers construct into a pattern-knitting diagram 
by using changing rules. And it displays the complete knitting image which is 
generated by using a method of representing 3-D strings. 

However, all design figures can’t be represented on this system, because the 
system allows designers to use only three simple symbols (square, triangle, and 
cross) for design. Generally we use various colored strings to knit. The system had 
dealt with only monochromatic knitting patterns by now. Representing colored 
knitting patterns could support more knitting designers. 

For reasons mentioned above, the purpose of this paper is to propose a 
method generating colored images of complete knitting pattern on our new sys- 
tem. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 420-425, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



A Supporting System for Colored Knitting Design 421 



2 Overview of Monochromatic Knitting Design System 



□□□□□□□□ 

□□□□□□□□ 

□□□□□□□A 

□□□□□AAA 

□□□□AAX^ 

□□□A^X^^ 

□□A^^^X^ 

□□A^X^^A 

□□A^^^AA 

□□□AAAA^ 

□□AA^^X^ 

□AA^AAA^ 



□□□□□□ A 

□ □ A A A A A 
A A X □ A A □ 

□ □ □ A A □ A 

□ X □ A □ □ A 

□ □ A A X □ A 
X □ A □ □ A □ 

□ □ A X □ A □ 

□ A A □ A A □ 
A A □ □ A □ □ 

□ □ X □ A □ □ 
X □ □ A A □ □ 

□ □ A □ □ □ □ 

□ A A □ □ □ □ 
A □ □ □ □ □ □ 




The design figure 



The pattern-knitting 
diagram 



The image of 
complete knitting 
pattern 



Fig. 1. A process of generating a image of complete knitting pattern 

Here is a figure which shows overview of monochromatic knitting design system [1] 
(see Figure 1). The processing of this system is divided into the following three 
stages. 

I. Knitting designers construct a design figure by using simple three sym- 
bols(square, triangle and cross). Square stands for the blank space. Triangle 
stands for the outlines of objects. In addition to them, Cross stands for the 
hole which is the inside of objects. 

II. Rows of the simple symbols on the design figure are changed into knitting 
stitches by using changing rules. And the pattern- knitting diagram is gener- 
ated. 

III. The pattern-knitting diagram is translated into the image of complete knit- 
ting pattern by using a method of representing 3-D strings [2]. And the com- 
pleted image is displayed and outputted. 

By using monochromatic knitting design system, all we have to do is to construct 
a simple design figure. Because system generates the pattern-knitting diagram 
from the design figure automatically. In addition to that, we can confirm the 
completed images of knitting before we actually begin to knit. 



2.1 Increase or Decrease of Loop 

Knitting consists of loops of string (see Figure 2). When designers knit by each 
knitting stitch, for example, Omoteme the number of loop on the row never 
changes. However, in case of Kakeme the number of loop on the row increases 
from 6 to 7 (see Figure 2(b)), and in case of Migiue-nime-ichido the number of 
loop on the row decreases from 6 to 5 (see Figure 2(c)). Table 1 shows increase or 
decrease of loop for each knitting stitch. Actually knitting has a strict principle 
that the number of loop never changes on the row. Thus we need to prepare 
“changing rules” to follow the principle. 



422 



Daisuke Suzuki et al. 




Fig. 2. Increase or decrease of loop for each knitting stitch 
Table 1. Knitting stitches and increase or decrease of its loop 



knitting stitch 


increase or 

decrease of loop 


□ B 


0 




+1 


H [X] 


-1 




-2 



2.2 Changing Rules 



In this subsection, a method of generating the pattern-knitting diagrams from 
the design figures is described. The changing rules are applied to each row of 
design figure. First, when there are not x on the row, this system changes □ to S, 
A to H (see Figure 3(a)). Next, there are x on the row, the system changes x to 
0. Kakeme is placed on a row, then the number of loop on next row is increased. 
As we have mentioned, knitting has a strict principle that the number of loop 
never changes on the row. Therefore, Migiuenime-ichido or Hidariuenime-ichido 
is placed on the row (see Figure 3(b)). 

Figure3 shows examples of applying changing rules. 



(a) 

(c) 

(e) 



□ ADA 
X □ A □ 

□ □ X □ 



□ - 



1 


— 


1 


— 


1 


o 


1 


X 


1 


— 


1 


1 


o 


X 


1 



(b) □ A X □ □ 
(d) X □ □ □ X 



1 


X 


o 


1 


1 


o 


1 




1 


o 



Fig. 3. Examples of applying change rules 

2.3 A Method of Representing 3-D Strings from Knitting Patterns 

In this subsection, a method of representing 3-D strings from knitting patterns is 
described. This method uses the string diagram database. Each knitting stitch is 
replaced with corresponding string diagram which is 3-D represented by referring 
the string diagram database. 



A Supporting System for Colored Knitting Design 



423 



Knitting 

Stitch 


String Diagram 


Knitting 

Stitch 


String Diagram 


m 


) 


-( 




B 


A 


X o 


H 


17 


X O 




Jl 


H 


4 ^- 


1- 


u 


:p-l 



Fig. 4. The string diagram database 

3 A Method Generating Colored Images of Complete 
Knitting Pattern 

We had already proposed the supporting system for knitting design, as you 
have seen, this system can support knitting designers. However, it can’t generate 
complicated design figures. Because the system allows designers to use only three 
simple symbols (square, triangle, and cross) for design. 

Images of complete knitting pattern have only one color in this system. Gen- 
erally we use various colored strings to knit. Representing colored images could 
support more users of knitting. Therefore, in this section, a method generating 
colored images of completed knitting pattern is described. 

3.1 The Application of Color Data to Pattern-Knitting Diagrams 

Here is a figure which shows applying color data to pattern-knitting diagrams 
(see Figure 5). The processing of applying color data is divided into the following 
three stages. 




The color figure 





























ikPir 

rrilVHBTI 

o • H > • j| 
■ HHOVB 

. ||> > ■• 


r 

1 

T 

1 


T 

■ 

T 

1 


1 

1 


[isf? 

■riiinn T 

II ■ o I I 


■ 

o 


1 


1 


1 




1 1 1 




• Ho • 


•I 


I' 


R 

■ 

1 

■ 


1 1 


1 O 1 




HH • HH • 


1 


N 

1 


■O'* ■■ 

[f. ■ ■ ■■ • 

■■■H • o 

Hl't' o 1 1 


1 o I H ■ ' 
•:'1HHVh 
'I'M'..' 




i 


■ 






Hi 'i. 


■ 




Lh 


LlLU 




k 


' TI'I' , j _ j 



The colored pattern- 
knitting diagram 




The colored image of 
complete knitting pattern 



Fig. 5. A process of applying color data to pattern-knitting diagrams 



I. When this system uses a pattern- knitting diagram (m x n size), designers 
prepare a color figure {m x n size). 




424 



Daisuke Suzuki et al. 



II. Each color data on the color figure is applied to each knitting stitch on 
the pattern-knitting diagram. And the colored pattern-knitting diagram is 
generated. 

III. The colored pattern-knitting diagram is translated into the colored image of 
complete knitting pattern. And it is displayed and outputted. 

3.2 Problem in Processing of Applying Color Data 

Figure 6(a) is an example that color data are applied to a pattern knitting 
diagram directly. As you have seen, color of a string change suddenly and un- 
naturally at each joint of knitting stitches. Thus we need to change unnatural 
joint into natural joint on design, (see Figure 6(b)) 




Fig. 6. Natural and unnatural joint 



3.3 A Method of Changing Unnatural Joint into Natural Joint for 
Each Knitting Stitch on Design 

Here is a figure which shows changing unnatural joint into natural joint for each 
knitting stitch on design, (see Figure 7) 




Omoteme Urame Kakeme 



X ^ r 






X 




i- 


1 1 1 


_r- • j Y 




1 1 


' i 


\..f %. -f h-fJ h- 



Migiue-nime-ichido Hidariue-nime-ichodo 




Nakaue-sanme-ichido 



Fig. 7. Changing unnatural joint into natural joint for each knitting stitch on 
design 





A Supporting System for Colored Knitting Design 425 



4 Output Example 



1 1 1 1 1 


1 1 1 1 1 


1 1 1 1 1 


1 1 1 1 1 


■ ■ ■ oWo ■ ■ ■ 


1 1 1 1 1 


1 1 1 1 1 


1 1 oM 1 


1 No 1 1 


1 1 1 1 1 


1 1 1 1 1 


» oM 1 1 


1 1 No 1 


1 1 1 1 1 


1 1 1 1 1 


■ N ■ o 1 


1 o ■ M ■ 


1 1 1 1 1 


1 1 1 1 1 


1 I No 1 


1 oM ■ > 


1 1 1 1 1 


1 1 1 1 1 


■ ■ ■ oWo ■ ■ ■ 


1 1 1 1 1 


1 1 1 1 1 


1 1 1 1 1 


1 1 1 1 1 



The colored pattern- 
knitting diagram 




The colored image of 
complete knitting pattern 



Fig. 8. Output example 



5 Conclusion 

In this paper, we solved the problem in process of applying color data to pattern- 
knitting diagrams and proposed the supporting system for colored knitting de- 
sign. Therefore we could understand usefulness of colored strings. In future 

works, we will increase the types of knitting stitches that the system deals with. 

References 

1. Fnnahashi, T., Iwatsnki, Y., Mntoh, A., Yamada, M. and Itoh, H: The Supporting 
System for Knitting Design Using Rule-base Trans, of IPSJ, Vol 39, No. 8,pp.2547- 
2550(1998). 

2. Itoh, Y., Yamada, M., Miyazaki, T., Seki, H. and Itoh, H: Processing for Knitting 
Patterns Using a Representation Method for 3D String Diagrams, Trans, of IPSJ, 
Vol 37, No. 2,pp.249-258(1996). 

3. Itoh, Y., Yamada, M., Miyazaki, T., Kunitachi, T., Fukumura, Y., Seki, H. and Itoh, 
H.: A Transformation Technique from Symbolic Media to 3-dimensional Patterns 
for Knitting, Proceedings of the Multimedia Japan 96, pp. 338-345(1996). 

4. Yamada, M., Itoh, Y., Seki, H. and Itoh, H.: An Implementation of a Knit-Pattern 
Generating System for Supporting Knit Designing, Trans, of IPSJ, Vol 36, No. 
ll,pp.2728-2735(1995). 

5. Yamada, M., Budiart, R., Itoh, H. and Seki, H.: A String Diagram Transformation 
Process using a Generic Algorithm - A Cat’s Cradle Diagram Generating Method 
-, Proc. of PRICAP94, Vol 1, pp.429-434(1994). 

6. R. Budiarto, M. Yamada, H. Seki, K. Itoh, T. Miyazaki and H. Itoh.: A System 
for Simulating and Display Magic Game in Cat’s Cradle and a Characterization 
Method of Its String State, FORMA Vol. 12 No.l pp. 75-89(1997). 

7. Letter Symbols for Knitting Stitch, JIS L 0201(1978). 





Learning Middle- Game Patterns in Chess 
A Case Study 



Miroslav Kubat^ and Jan Zizka^ 

^ Center for Advanced Computer Studies 
University of Louisiana at Lafayette 
Lafayette, LA 70504-4330, USA 
mkubat@cacs . usl . edu 
^ Department of Information Technologies 
Faculty of Informatics, Masaryk University 
Botanicka 68a, 602 00 Brno, Czech Republic 
zizkaOinf ormatics .muni . cz 



Abstract. Despite the undisputed strength of today’s chess-playing pro- 
grams, the fact that they have to evaluate millions, or even billions, of 
different positions per move is unsatisfactory. The amount of “computa- 
tion” carried out by human players is smaller by orders of magnitudes 
because they employ specific patterns that help them narrow the search 
tree. Similar approach should in principle be feasible also in computer 
programs. To draw attenion to this issue, we report our experiments 
with a program that learns to classify chessboard positions that permit 
the well-known bishop sacrifice at h7. We discuss some problems per- 
taining to the collection of training examples, their representation, and 
pre-classification. Classification accuracies achieved with a decision-tree 
based classiher are encouraging. 

Keywords: computer chess, pattern recognition, concept learning 



1 Introduction 

Since Shannon’s seminal paper [8], most chess-playing programs have relied on 
brute-force search. Their strength has grown with faster computer hardware, 
more sophisticated evaluation functions, and with the use of opening books and 
endgame databases. However, the fact that remarkable playing strength has been 
achieved by sheer number crunching is disturbing for the field of artificial intel- 
ligence. Whereas Deep Blue needed to evaluate billions of different positions, 
Kasparov hardly considered more than a few dozen per move. What secret en- 
abled him to think so much more effectively? 

Psychological studies conducted by deGroot a few decades ago [1] showed 
that advanced chess players rely on thousands of patterns learned from experi- 
ence and instruction. Each pattern either immediately points to a specific move, 
or at least weeds out many irrelevant ones, thus drastically narrowing the search 
tree. Alternatively, realizing that the current position can be transformed into 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 426-433, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Learning Middle-Game Patterns in Chess: A Case Study 427 



one that is known as won, the player can chose to pursue only those variations 
that contribute to the transformation. 

Put into the terminology of artificial intelligence, the search carried out by a 
human player has a low branching factor that permits calculation of long vari- 
ations. Lacking the knowledge of patterns, computer programs examine at each 
move so many continuations that the resulting high branching factor precludes 
deeper analysis (as expressed in terms of the number of moves). Endowing the 
program with the ability to recognize useful patterns might therefore increase 
the playing strength of a computer by rendering its behavior more “intelligent.” 

However, direct hard-coding of middlegame patterns is difficult because chess 
masters are notoriously inarticulate about how precisely they perceive the po- 
sitions on the board. An alternative is to induce the patterns from their posi- 
tive and negative examples by means of an appropriate machine-learning tech- 
nique^. Previous research has predominantly focused on induction of endgame 
patters [9,10] whereas attempts to address middlegame patterns are rare [2,4]. 
Yet, at least learning the patterns underlying some typical sacrificial attacks in 
middlegame should also be possible. 

We present a case study that reports our work on a computer program whose 
task is to decide (without calculating variations) , whether or not the well-known 
bishop sacrifice at h7 is in a given position correct. The paper discusses two 
fundamental issues: (1) how to collect the training examples (including their 
classification); and (2) how to represent the examples for the needs of machine 
learning. We briefly summarize experimental results that we believe encourage 
further research. 

2 The Training Set 

The first step in any concept-learning undertaking is to assemble the training set 
containing the positive and negative examples of the given “concept” (in our case, 
a chessboard pattern that guarantees the correctness of the given sacrifice) . This 
issue splits into two questions: where to obtain pre-classified training instances 
and how to represent them in computer memory. 



2.1 Collecting the Examples 

We extracted the learning examples from commercial chess databases. In the 
basic search command, we requested positions with white bishop on h7, black 
king on g8, and black pawns on f7, g7 (or black bishop on h2, white king on 
gl, and white pawns on f2, g2), ignoring the configuration of other pieces. If the 
sacrifice was carried out by Black, we took the mirror image of the position in 
order to obtain a unified training set where the sacrifice is always carried out by 
White. After preliminary data collection, we manually selected positions where 
the bishop has just moved to h7 as a result of intended sacrifice and we removed 
duplicities (identical positions). 



^ For an overview of machine-learning approaches to chess programming, see [3]. 



428 Miroslav Kubat and Jan Zizka 



The next step required to pre-classify the positions as positive and negative 
examples. This was not simple because the mere fact that White has won the 
game does not necessarily imply that the sacrifice is correct. Conversely, White 
can possibly lose even if the sacrifice is fully justified: the actual outcome may 
have resulted from either opponent’s later mistake. As a criterion for correctness, 
we required that the sacrifice (whether accepted or declined) significantly im- 
proved White’s game. Following this rule, we tested the positions using one of the 
best commercial chess programs, Chessbase’s Fritz 5.32. Still, the classifications 
are not guaranteed to be noise- free. 

Another aspect worth consideration is that White can win thanks to either 
a successful mating attack or a decisive material advantage. The mating attack 
itself can involve different mating schemes, a fact that the binary classifica- 
tion [correct, incorrect] ignores. Increasing the number of classes into which the 
training examples are pre-classified does not solve the problem because then 
the numbers of examples representing each particular class are much too low to 
support any conclusions about the final system’s classification behavior. For all 
these reasons, the learning task can be regarded as difficult. 



Paulsen- Schwarz, 1879 
(positive example) 


Zukertort-Steinitz, 1886 
(negative example) 


!■ ■ 




111 1 


■iHlIlIli i 




■ ill II Ik i 


ililBill 




IIAII if 


III BiH 1 




■ lli.il 1 


O O ill 




■1 II B 


p 




iiAiiiiatii 


■ 11 Hfc A 




AllAll HA A 



Fig. 1. Two examples, taken from real games and included in the training set. 
The diagrams capture the positions just before the bishop sacrifice at h7. 



To give the reader an idea of what patterns can underly the concept “correct 
h7-sacrifice,” the left part of Figure 1 shows a positive example. The position 
is taken from the game Paulsen- Schwarz (Leipzig, 1879). The presence of the 
white knight on f3 enables White (after Ng5) to control the squares f7 and 
h7, while f8 is blocked by the black rook. Moreover, the white rook on c5 can 
quickly intervene (after Rc3) along the third rank that will be freed by Bxh7 and 
Ng5. White’s pawns on d4 and e5 obstruct Black’s pieces. The game continued 



Learning Middle-Game Patterns in Chess: A Case Study 429 



16.Bxh7! Kxh7 (after 16... Kh8 17.Ng5 and Black loses quickly) 17.Ng5+ Kg6 
(17... Kg8 18.Qh5) 18.Qg4 and Black gave up on the 27^^ move. 

As a negative example, the right-hand part of Figure 1 shows a position 
from the 11^^ game of the World Championship match between Zukertort and 
Steinitz (New Orleans, 1886). Black has a small material advantage and White, 
whose pieces are trained at the Black’s castled position, opted for sacrifice. The 
decision was wrong because the black king was able to escape through f7 (after 
f6). The game continued 17... Kxh7 18.Qh5+ Kg8 19.Rh3 f6 20.Qh7+ Kf7 and 
Black eventually won on the 42^^^ move. 

The training set that we obtained from chess databases consisted of 85 pos- 
itive and 39 negative examples. The fact that negative examples are under- 
represented can adversely affect learning performance [5]. We therefore created 
additional synthetic negative examples by changing certain irrelevant features of 
existing positions (while preserving the relevant ones) . The changes affected the 
positions of pawns, sometimes also pieces. This approach is in accordance with 
the findings of psychology: chess players quickly discern a pattern consisting of 
a few relevant positional elements without being confused by the irrelevant as- 
pects. The changes were carried out by a human expert and the added positions 
were, again, tested with Fritz 5.32 to make sure that the sacrifice was still cor- 
rect. We have thus increased the total number of positions from 124 to 200 (85 
positive and 115 negative examples). 

The h7 sacrifice is well known, and as such is rarely encountered in tourna- 
ment play (although it is frequently considered in variations the players calculate 
in their minds). This explains why so few examples from real games were found. 
Creating artificial examples is laborious and perhaps not quite justified: by syn- 
thesizing them, the researcher inevitably introduces his or her own bias. 

2.2 Representing the Examples 

One of the crucial decisions in concept learning is the question how to represent 
the examples [6]. To keep the task simple, we decided to work with attribute 
vectors. As for the definition of the individual attributes, we experimented with 
the three mechanisms described below. 

Trivial Representation. A straightforward way to encode a chess position 
is by a vector of 64 symbolic attributes, one per each square, starting at the 
upper left corner and ending at the lower right corner of the chessboard. The 
elements of the vector therefore correspond to the following sequence of squares: 
a8, 58, ... , /i8, a7, 57, ... , h7, ..., al, 51, ... , hi. 

Each attribute acquires one out of 13 different symbolic values: ‘0’ for an 
empty square, ‘l’/‘ll’ for a white/black pawn, ‘2’/‘12’ for a white/black knight, 
‘3’/‘13’ for a white/black bishop, ‘4’/‘14’ for a white/black rook, ‘5’/C5’ for the 
white/black queen, and, finally, ‘6’/ ‘16’ for the white/black king. 

For example, the ranks 8 and 7 in the right part of Figure 1 will be encoded 
as (14,0,0,0,14,0,16,0) and (11,11,11,0,0,11,11,11). The first rank will be encoded 
as (0,0, 0,0, 0,4, 6,0). 



430 Miroslav Kubat and Jan Zizka 



Detailed Representation. The previous formalism provides only a static im- 
age of the position. To capture the mobility of the pawns and pieces, as well 
as the relations between them, we experimented with a more detailed encod- 
ing where each square was represented by 71 binary attributes, G { — 1,+1}, 
i = 1, 2, ..., 71. This means that any position is encoded by 71 x 64 = 4544 bi- 
nary attributes. Again, the first 71-tuple characterizes the square a8, the next 
one characterizes b8, and the last 71-tuple characterizes hi. 

Let A denote the square that is characterized by a given tuple. The first 
seven attributes in the tuple indicate the “status” of A: empty, occupied by 
a white pawn, occupied by a black knight, and so on. The next 64 attributes 
specify, for each square on the board, whether it can be reached by a legal move 
of the piece on A (if A is empty, then all these 64 attributes will acquire — 1). 
The elements of the 71-tuple are defined in Table 1. For illustration, the vector 



Table 1. Definition of the 71 attributes characterizing each square in the “de- 
tailed representation.” 



ai = +1 . . . a white piece on the square, —la black piece on the square 

tt 2 = +1 . . . a Pawn on the square, otherwise —1 

as = +1 . . . a Knight on the square, otherwise —1 

U 4 = +1 . . . a Bishop on the square, otherwise —1 

as = +1 . . . a Rook on the square, otherwise —1 

ae = +1 . . . a Queen on the square, otherwise —1 

ar = +1 . . . a King on the square, otherwise —1 

as = +1 . . . the square a8 can be reached by the piece on A, otherwise — 1 



an = +1 . . . the square hi can be reached by the piece on A, otherwise — 1. 



defining a position with black pieces Ra8, Qd8, Pa7 would begin with a 71- 
tuple characterizing the square a8 and the mobility of Ra8. This means that 
the first attributes in the corresponding vector will acquire the following values: 
(— 1, — 1, — 1, — 1, +1, — 1, — 1, +1, +1, +1, — 1, ...): the squares accessible for Ra8 
are a8, b8, and c8, whereas all remaining 61 squares are inaccessible for the 
Rook, so the value for each of them is —1. 



Expert Attributes. Mechanical representations tend to ignore such aspects 
as cooperation of pieces, the possibility to move a piece into attacking position 
within two or three moves, or a possibility of an immediate recapture of a sac- 
rificed piece. In the last representation scheme, we used 59 binary attributes 
defined by a human expert. These attributes included: white king is castled; 
white queen is active on a particular diagonal/rank/file, white or black bishops 
are active on their diagonals, black knight defends black king’s position, black 
king can escape through f8 or f7, white pawn controls f6, white rook controls an 
open file, and so on. 



Learning Middle-Game Patterns in Chess: A Case Study 431 



3 Experiments 

For the induction task, we chose Quinlan’s C5 [7] which is a somewhat more 
versatile successor of the popular C4.5. We experimented with the following four 
options provided by the package: induction of decision trees (C5), induction of 
rules (C5r) induction of decision trees with boosting (C5b), and induction of 
rules with boosting (C5rb). In the “boosted” options, we used 10 classifiers. 

To minimize the effects of statistical ffuctuations, we used, in each experi- 
ment, 10 random runs of 5-fold cross-validation. As the primarily concern here 
is to assess the utitility of machine learning for pattern recognition in chess mid- 
dlegames, we only report mean values with standard deviations, without making 
any claims about statistical significance. 

The results are summarized in Table 2 where each row stands for one of the 
four aforementioned options of the machine learning software. We considered 
the three representation schemes discussed in Section 2.2, and for each of them 
carried out separate experiments with the original training set (124 examples) 
and with the enlarged training set (200 examples). 



Table 2. Error rates (in %) achieved by four different learning algorithms for 
various training sets. Each field is obtained from 10 different runs of 5-fold cross- 
validation. 



encoding 


trivial 


detailed 


expert 




124 ex. 


200 ex. 


124 ex. 


200 ex. 


124 ex. 


200 ex. 


C5 


34.4T4.8 


27.9T2.6 


32.2T3.7 


20.9T3.1 


33.1T2.9 


16.0T2.5 


C5r 


32.4T3.5 


26.9T2.1 


31.5T3.1 


21.3T3.5 


31.5T2.7 


17.8T2.5 


C5b 


32.6T2.5 


15.9T2.5 


30.9T2.1 


15.7T3.7 


27.7T2.8 


12.3T1.6 


C5rb 


31.7T2.3 


16.1T2.3 


31.4T2.4 


15.2T4.0 


28.0T1.9 


13.2T1.9 



The results show that boosting always helped increase the accurracy. This 
indicates that the classification task is indeed difficult and cannot be straight- 
forwardly expressed in simple rules or decision trees. Each of the three represen- 
tation schemes tended to give about the same results as long as we worked with 
the smaller training sets (124 examples): the examples were simply too sparse to 
benefit from an improved representation. For the enlarged sets (200 examples), 
the impact of representation was obvious, the best results being achieved with 
the expert attributes (the last two columns in the table). 

The fact that the boosted version of decision trees can reach nearly 88% 
classification accuracy is encouraging. The experiments bear promis that a rea- 
sonably accurate classifier can be induced, provided that a sufficiently large set 
of training examples has been assembled. An important aspect of the learning 
procedure is the way the examples are described. 



432 



Miroslav Kubat and Jan Zizka 



On a final note, boosted algorithms outstripped plain learners but we did 
not observe any significant difference between the performance of decision trees 
and rules. 

4 Conclusions 

The case study reported in this paper provides some evidence that certain mid- 
dlegame patterns underlying well-known sacrificial combinations can be learned 
from a set of positive and negative examples. This finding can prove useful in 
attempts to increase effectiveness of chess-playing programs: instead of routine 
calculation of countless variations, the agent would first search for patterns that 
can narrow the search tree. We surmise that the patterns can be beneficial also 
in attempts to improve the explanation abilities of computer programs. 

Experimental evidence indicates that, at least for the pattern investigated 
in this work, success can be expected only if the following prerequisites are 
satisfied: (1) the researcher is able to prepare a sufficiently large training set; (2) 
a reasonable representation of the examples has been found; and (3) a sufficiently 
powerful learning algorithm, such as C5 with boosting, has been employed. 

A scientist has to be careful not to jump to conclusions. The achieved clas- 
sification performance (87.7%) has only indicative value. On the one hand, an 
improved choice of attributes, and a much larger training set, are likely to lead 
to further improvement. On the other hand, more attention has to be devoted 
to the question how to synthesize artificial examples from existing ones without 
skewing the results towards more favorable numbers. More research, and more 
extensive experimentation, are needed. 

In our future work, we first of all plan to experiment with some other typ- 
ical sacrifices to provide more experimental evidence of the feasibility of the 
approach. Advanced chess players have mastered hundreds of such patterns and 
so should machines. Second, we want to devote more attention to methods for 
the synthesis of artificial examples and for their representation. Finally, as the 
claim of this paper is that machine learning can improve the performance of 
chess playing programs, we plan experiments that will measure the savings in 
calculations brought about by the knowledge of the given pattern. 

Chess players might be able to formulate at least “rough-and-ready” rules 
of thumb. The fact that we ourselves could not hand-code any rules that would 
match the performance of the classifiers induced by C5 can be due to our inade- 
quate knowledge of the game. Systematic experiments with patterns formulated 
by masters and grandmasters would be helpful. We encourage other researchers 
to study these aspects. 

Acknowledgement 

The research reported in this paper was partly supported by the Louisiana Board 
of Regions and by the National Science Foundation under the grant number 
NSF /LEQSF-SI-JFAP-06. Significant part of the research was carried out during 
J. Zizka’s visit at the University of Louisiana at Lafayette. 



Learning Middle-Game Patterns in Chess: A Case Study 433 



References 

1. deGroot, A. D. (1965). Thought and Choice in Chess, Mouton, The Hague 426 

2. Flinter, S. and Keane, M. T. (1995). Using Chunking for the Automatic Generation 
of Cases in Chess. Proceedings of the 1st International Conference on Case Based 
Reasoning, ICCBR-95, Springer Verlag 427 

3. Fiirnkranz, J. (1996). Machine Learning in Computer Chess: The Next Generation. 
International Computer Chess Association Journal, 19, 147-160 427 

4. Gobet, J. and Jansen, P. (1994). Towards a Chess Program Based on a Model of 
Human Memory. In H. J. van den Herik et al. (eds.) Advances in Computer Chess, 
7, pp. 35-60, University of Limburg 427 

5. Kubat, M. and Matwin, S. (1997). Addressing the Curse of Imbalanced Train- 
ing Sets: One-Sided Selection. — em Proceedings of the Fourteenth International 
Conference ICML’97. July 8-12, 1997, Nashville, Tennessee, pp. 179-186. 429 

6. Mitchell, T. M. (1996). Machine Learning, McCraw Hill 429 

7. Quinlan, J. R. (1996). Bagging, Boosting, and C4.5. Proceedings of the Eight An- 
nual Conference on Innovative Applications of Artificial Intelligence, AAAP96. 
August 4-8, 1996, Portland, Oregon, pp. 725-730. 431 

8. Shannon, C. E. (1950). Programming a Computer for Playing Chess. Philosophical 
Magazine, 41(4), 256-275 426 

9. Shapiro, A. D. (1987). Structured Induction in Expert Systems. Turing Institute 
Press, Addison- Wesley 427 

10. Weill, J.-C. (1994). How Hard is the Correct Coding of an Easy Endgame. In H. J. 
van den Herik et al. (eds.) Advances in Computer Chess, 7, pp. 163-176, University 
of Limburg 427 



Meta-classifiers and Selective Superiority 



Ryan Benton, Miroslav Kubat, and Rasaiah Loganantharaj 

Center for Advaneed Computer Studies 
University of Louisiana, Lafayette, LA 70504 
{ rgb8817 , mkubat , logan}@cacs . louisiana . edu 



Abstract. Given that no one elassifieation method is the best in all tasks, a 
variety of approaehes have evolved to prevent poor performanee due to 
mismateh of eapabilities. One approaeh to overeome this problem is to 
determine when a method may be appropriate for a given problem. A seeond, 
more popular approaeh is to eombine the eapabilities of two or more 
elassifieation methods. This paper provides some evidenee that the eombining 
of elassifiers ean yield more robust solutions. 



1 Introduction 

This paper focuses on classifiers whose task is to assign an example, described by a 
number of symbolic or numeric attributes, to one of several known categories. In the 
Machine Learning community, it has long been accepted that no classification method 
is best for all situations. This situation, where each classification method has a 
superior performance in limited cases, is called selective superiority. Keeping this 
idea in mind, efforts in the field can be broken into three loose categories. The first is 
the ongoing efforts to develop methods that yield a high degree of accuracy and are 
applicable to a wide variety of problems. These include ongoing efforts in improving 
Decision Trees [12] and Neural Networks [14]. The second category attempts to 
determine which classifier inductor is best suited for a given problem. The third seeks 
to overcome the ‘weakness’ of a single classifier by combining the capabilities of 
several classifiers together. Both the second and third approaches are expanded upon 
in the following paragraphs. 

Empirical and theoretical studies have been initiated to determine when a given 
classifier inductor is appropriate. Some of them focus on a particular type of 
inductor. Examples are David Aha’s attempts to determine which problems IBl is 
capable of learning [1], Rendell et al study of the decision trees [13], Salzberg et al. 
analysis of the best-case situation for the Nearest Neighbor algorithm [15], and 
Wilson’s study on various protostyles of instance-based techniques [18]. Broader 
studies include King et al [7] and Gama et al [5]. These papers substantiate the claim 
that all algorithms have tasks in which they excel and others in which they 
underperform. However, none of the above methods have produced a generalizable, 
efficient, and understandable means for picking a classifier that is well suited for a 
given task. 

The third approach tries to combine the predictions of a number of base classifiers. 
Figure 1 illustrates this approach. A base classifier is any classifier that makes 
predictions on examples described by various binary and continuous attributes. A 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 434-442, 2000. 

© Springer-Verlag Berlin Heidelberg 2000 




Meta-classifiers and Selective Superiority 435 



meta-classifier then uses the predictions of the base classifiers to assign a label to an 
example. A meta-classifier only uses the predictions of the base classifiers to assign a 
label. 




Figure 1. A Meta-Classifier 

Techniques from this category have met with more immediate success. The 
simplest approach is the Plurality Voting (PV) method considered by Merz [10] as a 
good “straw man” method. In PV, a group of base classifiers are presented with the 
same training data. When used to classify testing examples, each base classifier 
assigns a label, which is considered the classifier’s vote, and the meta-classifier 
simply selects the label that receives the most votes. 

A more sophisticated approach is Stacking, introduced by Wolpert [19]. In 
Stacking, a group of base classifiers are trained upon a data set. These classifiers are 
then presented with a second set, called the validation set, for which they give their 
predictions. These predictions along with the true labels are used for the induction of 
a meta-classifier of a higher sophistication than plurality voting. 

Another approach, presented by Brodley [3], depends on the common 
representation of examples as points in N-dimensional space. Brodley defines a 
recursive method in which a classifier is chosen to classify a region in the N- 
dimensional space. The classifier then breaks the region into two or more subregions. 
If the points assigned to the subregion all have the same label, then nothing more is 
done. If the labels of the points in the subregion do not agree, a classifier is induced 
fi*om the points in this subregion. Hence, at the end of the training, one would have a 
decision tree whose internal nodes correspond to base classifiers and whose leaves 
correspond to a prediction. 

The success of the techniques that combine classifiers comes from their ability to 
reduce the bias error as well as the variance error [4]. The bias error is caused when 
there is a mismatch between the structure of the problem and the structure 
(representation) reasoned upon by a classifier. If each of the base classifiers uses a 
different representation, then the chance of finding a representation that matches the 
problem increases. Stacking algorithms are an example of techniques that try to 
reduce bias error. 

Variance error, on the other hand, arises when small changes in the training set can 
lead to large changes in a classifier’s ability to correctly assign labels. Schapire’s [16] 
method for constructing and combining base classifiers aims to reduce variance error 
by carefully constructing complementary classifiers. Complementary classifiers are 
ones that do not assign incorrect labels to the same example. 






436 Ryan Benton et al. 



All the above combination techniques tend to overlook two potential problems. 
First, meta-classifiers themselves are also classifiers. Since selective superiority 
states that no classifier is best in all situations, this means there will be cases when a 
combining algorithm will not be effective. Second, there is a distinct possibility that 
some of the base classifiers will contribute little or no new information to the 
information already provided by other base classifiers. This often occurs when two 
algorithms’ structure and variance are similar. 

This study examines a small group of simple meta-classifiers and compares their 
performance with a series of well-known base classifiers. Further, we present some 
support that the meta-classifiers, while outperforming the base classifiers on average, 
still suffer from the selective superiority problem faced by base classifiers. 



2 Methodology 

This study examines the performance of four meta-methods: plain voting, induction 
of decision trees, k-Nearest Neighbor, and neural networks. Each of the combining 
schemes has access to the same base classifiers. These base classifiers are C4.5, 
Ltree, k-Nearest Neighbor, Projection k-Nearest Neighbor, and a simple prototype 
algorithm. 

2.1 Base Classifier Induction Techniques 

The first classifier is the k-Nearest Neighbor (KNN) technique. Given a point 
with an unknown label, the method computes the distance of to each point in the 
training set. X^ is assigned the label prevailing among the k closest examples. 

A variant, the Projection k-Nearest Neighbor [2], breaks the training set into M 
one-dimensional sets, where M is the number of attributes. When given a point X^ 
with an unknown label, each attribute is compared to the corresponding one- 
dimensional set. Each set returns the K closest one-dimensional points. Once the M 
sets of K closest one-dimensional points are found, a simple vote takes place. The 
label winning this vote is then assigned to X^. This method is believed to outperform 
the KNN algorithm in domains with many irrelevant attributes. The method is also 
unaffected by the disparate scales of each attribute. 

A well-known shortcoming of the KNN method is the need to retain the entire 
training set. This can be expensive in terms of storage and in classification costs. 
One approach to reduce the size of the training set is to find protovectors, sometimes 
called code vectors. Each protovector represents a sizeable portion of a category. 
The following iterative technique used here is described by Tveter [17]. The method 
picks a training example and calculates its distance to the protovectors that have the 
same label. If no protovectors exist, or if the example is not within a predefined 
distance T) of a protovector, the example is considered a protovector and is placed on 
a list. If the example lies within D distance of a protovector, the closest protovector is 
moved toward the example. This adjustment attempts to position the protovector in 
the center of the region it represents. This process continues until all the training 
examples are examined. Since this is done separately for each class, perfect 
classification is not guaranteed. 




Meta-classifiers and Selective Superiority 437 



Two different methods based on decision trees are used for classification. A 
typical decision tree selects one attribute with two or more values and breaks the 
current region into two or more subregions. If the points in the subregion have more 
than one label, the subregion is divided using another attribute. This continues until 
either all subregions are ‘pure’ or until some other termination criterion is satisfied. 
After the tree is built, a process called pruning takes place. This process merges some 
of the subregions together. This is done to reduce the danger of overfitting; the 
unpruned tree may not be general enough to accurately classify new points. In this 
study, Quinlan’s C4.5 [1 1] is used for decision tree induction. 

A variant of the typical univariate classification tree is the multivariate tree. A 
multivariate tree allows for more than one attribute to split a region into subregions. 
This provides the tree with greater representational power at the cost of greater 
complexity. The Ltree method [6] is a good example of this approach. 

Another approach to classification is the use of the Back Propagation (BP) Neural 
Network [14]. While this study does not use the BP algorithm as a base classifier, it 
is used as a meta-classifier. Typically, a BP network has 3 layers: these are the input 
layer, the hidden layer, and the output layer. Each input node corresponds to an 
attribute. The hidden layer has H number of nodes, and each hidden node takes the 
weighted sum of the signals coming from the input nodes. This sum is applied to a 
transfer function and the output is passed to the output layer. The output layer has L 
number of nodes where L is typically the number of classes. Each of the output nodes 
sum up the weighted outputs of the hidden layer and apply that once again to a 
transfer function. When presented a point to be classified, the BP method passes the 
point to the input layer. After the values of the output nodes are calculated, the point 
is assigned a label corresponding to the output node with the highest value. 



2.2 Experimental Data 

Ten data sets were obtained from the UCI repository [9]. These data sets were 
preprocessed to remove any example that contained one or more missing values. Any 
data set that contained more than two classes was converted to two-class problems, 
following the conventions used by Kubat et al. [8]. This process resulted in twelve 
data sets. Finally, normalized versions of eight data sets, indicated by stars in 
Table 1, were created, resulting in a total of twenty data sets. 



Table 1. Summary of Two Class Data Sets 



Data Set 


# ex. 


# atts. 


majoritv 


Class 1 


Class 2 


*ABAL3 


526 


8 


50.76% 


{6 Rings} 


(12 Rings} 


*ABAL4 


594 


8 


65.82% 


{7 Rings} 


{13 Rings} 


*BAL2 


625 


4 


53.92% 


{tilt-left} 


{tilt-right, balance} 


*BC 


683 


9 


65.01% 


{2} 


{4} 


DERM2 


101 


34 


51.49% 


{4} 


{5} 


DERM3 


113 


34 


53.98% 


{2} 


{5} 


*ECH 


61 


12 


72.13% 


{0} 


{1} 


*GLASS 


214 


9 


76.17% 


{1 .2,3,4} 


{5,6.7} 


HABER 


306 


3 


73.53% 


{1} 


{2} 


*HEP 


80 


19 


83.75% 


{Die} 


{Live} 


*IRIS3 


100 


4 


50.00% 


{2} 


{3} 


MACH 


209 


6 


57.89% 


[21,1011 


all other values 





438 



Ryan Benton et al. 



2.3 Experimental Parameters 

For the purpose of this study, each data set is broken into three subsets: the training 
set, validation set, and test set. The training set, composed of 40% of the examples, is 
used to train the base classifiers. The validation set, which contains 30%, is presented 
to the base classifiers and the predictions of the base classifiers are used to train the 
meta-classifiers. The accuracy of the meta-classifiers and the base classifiers are 
compared using the test set, which contains the rest of the examples. In order to 
ensure that the results give a fair picture of the classifiers’ capabilities, this process is 
repeated ten times. The training, validation, and test sets maintain the same 
proportion of class one and class two examples as the full data set has. 

Once the experimental training sets are created, they are presented to the base 
classifiers. For the KNN and PKNN, five values of k are experimented with. These 
values are 1, 3, 5, 7, and 9. Both C4.5 and Ltree use their default parameters. The 
parameters for the protovector method are determined experimentally for each data 
set. The minimum number of protovectors that could be found while maintaining 
good accuracy is sought. Generally speaking, only two points are normally required 
to describe the various training sets with approximately 85% accuracy. 

After training the base classifiers, their accuracy is computed. The predictions 
generated by C4.5, Ltree, and the proto vector algorithm are sent to the meta- 
classifiers. The predictions of the best KNN algorithm and the best PKNN algorithm 
are also presented to the meta-classifiers. The other four KNN and PKNN methods 
are ignored. Thus, the meta-classifiers are trained on the predictions of five 
classifiers. 

For the meta-classifiers, the KNN algorithm seeks only the closest neighbor. The 
back propagation algorithm has three layers: the input, hidden and output layers. 
These layers have five nodes, two nodes, and two nodes, respectively. The default 
values for the C4.5 algorithm are used; however, both the pruned and unpruned trees 
are used on the test set. 

All paired t-tests have been performed with a target level of five-percent 
significance. 



3 Results 

Before analyzing the performance of the meta-classifiers, a couple of comments 
should be made about the base classifiers. First, as expected, there is no one classifier 
that is best for all situations. In fact, the results of paired t-tests often denote a 
significant difference. Second, the impact of normalizing data varies from data set to 
data set. While the average accuracy of the algorithms improves in some cases, it 
declines in others. 

When examining the results of the meta-classifiers, it is interesting to note that all 
five meta-classifiers have at least one data set in which they exhibit the highest 
accuracy and data sets in which their results are significantly different from the best 
performing meta-classifier. Furthermore, in 14 out of 20 cases, the best meta- 
classifier has an average equal to or better than the best base classifier. In the 6 cases 
where a base classifier has the highest average, there is no significant difference 




Meta-classifiers and Selective Superiority 439 



between the best meta-classifier and the best base classifier. However, only in the 
non-normalized abal4 set did a meta-classifier significantly improve upon the best 
base classifier. 



Table 2. Average Accuracy for Base Classifiers on Normalized Data. 

Bold - Highest accuracy for row. Underline - Significantly different from highest accuracy 





























Proto 


Abal3 




92.4% 
























Abal4 




























Bal2 












87.2% 




































































i9B!i 




















































































i9B!i 





Table 3. Average Accuracy for Base Classifiers on Non-Normalized Data 
Bold - Highest accuracy for row; Underline - Significantly different from highest accuracy 





INN 


3NN 


5NN 


7NN 


9NN 


1 PNN 










C4.5 


Ltree 












KMmxm 


91.4% 


88.9% 
























idrIAM 
















BIBEM 








PMI 


EBB! 


KWKBgl 


EMI 


MIM 


MIBB 


MIBB 


MMB 


MIBB 


M^B 


BIffBl 


MB! 


Be 








97.1% 


97.1% 


80.0% 












96.8% 












68.0% 


61.0% 


100 . 0 % 






IfME 






















MM 






































95 . 5 % 














91.2% 


90.9% 


7P.2% 










EBBBB 


















72.9% 
























80.8% 


00 

o 

bo 


80.8% 
























KIriiM 
















EfMl 






MEBi 


BEEBi 


BBIBB 


EsS^ 


78.0% 


73.0% 


BBBB 


MMB 


MMB 


MMB 


MBB 


MIBB 


MIBi 



Table 4. Meta-Classifier Accuracy over Normalized Data 
Bold - Highest accur acy for row; Underline - Significantly different from highest accuracy 





1 NN 


C4.5 - P 


C4.5 - U 


BP 


PV 


ABAL3 


92.8% 


92.5% 


92.4% 


93 . 1 % 


92.6% 


ABAL4 


90 . 9 % 


90.1% 


90.2% 


90.1%o 


88.9% 


BAL2 


94 . 2 % 


94.1% 


94.2% 


90.3%o 


93.2% 


BC 


97.1% 


97.2% 


97.1% 


97 . 2 % 


96.5%o 


ECH 


97.5% 


96.5% 


96.5% 


93.6% 


98 . 0 % 


GLASS 


99.5% 


99 . 7 % 


99 . 7 % 


98.3% 


98.0% 


HEP 


84.2% 


83.1% 


82.7% 


82.8% 


84 . 6 % 


IRIS3 


92.3% 


95 . 0 % 


95 . 0 % 


92.6% 


91.7% 



Based on those observations, it would appear that there is little reason to use meta- 
classifiers. However, if one compares the accuracy over all the data sets, the five 
meta-classifiers are part of the top six most accurate classifiers. The base classifier 
Ltree takes the fifth place both times. Further, for the non-normalized sets, the meta- 












































































440 Ryan Benton et al. 



classifiers place in the top six smallest standard deviations, with the 3-NN method 
placing fifth. For normalized data, the meta-classifiers place in the top 8 smallest 
standard deviations. The 3-NN, 5-NN, and 7-NN methods place second, first, and 
third, respectively. These results suggest that the meta-classifiers tend to be more 
stable than their base classifiers. 



Table 5. Meta-Classifier Standard Deviation over Normalized Data 





1 NN 


C4.5 - P 


C4.5 - U 


BP 


PV 


ABAL3 


1.96% 


2.08% 


1.99% 


1.82% 


1.95% 


ABAL4 


1.21% 


1.93% 


1 .78% 


1.70% 


2.09% 


BAL2 


2.00% 


1.36% 


1.41% 


12.33% 


2.52% 


BC 


1.04% 


0.85% 


0.87% 


0.76% 


1 .24% 


ECH 


3.54% 


3.37% 


3.37% 


5.98% 


1.44% 


GLASS 


1.02% 


0.64% 


0.64% 


3.53% 


1 .44% 


HEP 


3.37% 


3.24% 


3.74% 


3.41% 


6.28% 


IRIS3 


6.49% 


4.51% 


4.51% 


8.06% 


5.93% 



Table 6. Meta-Classifier Aeeuraey over Non-Normalized Data 
Bold - Highest aeeur aey for row; Underline - Signifieantly different from highest aeeuraey 

1 NN C4.5 - P C4.5 - U I BP PV 

ABAL3 93.3% 93.1% 93.0% 93.6% 93.5% 

ABAL4 90.5% 91.2% 91.0% 91.1% 90.7% 

BAL2 94.3% 94.3% 94.3% 91.2% 93.0% 

BC 97.3% 97.3% 97.3% 97.2% 97.2% 

DERM2 100.0% 100.0% 100.0% 96.5% 100.0% 

DERM3 100.0% 100.0% 100.0% 99.2% 97.0% 

ECH 97.0% 97.0% 97.0% 96.4% 96.5% 

GLASS 92.1% 91.8% 91.8% 90.7% 92.4% 

HABER 75.3% 74.1% 74.3% 74.7% 74.5% 

HEP 78.8% 80.8% 80.0% 79.7% 83.8% 

IRIS3 93.7% 95.0% 95.0% 92.0% 93.3% 

MACH 77.0% 78.0% 77.7% 72.9% 77.8% 



Table 7. M eta-Classifier Standard Deviation over Non-Normalized Da ta 

~ 1 NN C4.5 - P C4.5 - U I BP PV 

ABAL3 1.8% 1.7% 1.7% 2.1% 2.3% 

ABAL4 1.9% 1.8% 1.9% 2.0% 1.5% 

BAL2 1.1% 1.2% 1.2% 11.1% 1.5% 

BC 97.3% 97.3% 97.3% 97.2% 97.2% 

DERM2 1.2% 1.1% 1.1% 1.1% 1.2% 

DERM3 0.0% 0.0% 0.0% 12.0% 0.0% 

ECH 3.5% 3.5% 3.5% 3.5% 3.4% 

GLASS 4.2% 5.1% 5.1% 5.0% 4.1% 

HABER 2.1% 1.4% 1.7% 2.1% 3.0% 

HEP 4.5% 1.8% 3.0% 5.7% 3.5% 

IRIS3 3.3% 3.9% 3.9% 7.7% 3.5% 

MACH 1.7% 5.6% 4.8% 8.1% 5.5% 
















































































































































































































Meta-classifiers and Selective Superiority 441 



Table 8. Overall Statistics for Meta-Classifier 





KNN 


C4.5-Pruned 


r5-Uunprun 


BP 


PV 


Meta Non-Normal ized Accuracy 


90.8% 


91.0% 


91.0% 


89.6% 


90.8% 


Meta Non-Normilized St. Dev. 


8.8% 


8.8% 


8.9% 


9.3% 


8.2% 


Meta Norm il ized Accuracy 


93.6% 


93.5% 


93.5% 


92.3% 


93.0% 


Meta Normilized St. Dev. 


5.4% 


5.4% 


5.5% 


5.8% 


5.5% 



Table 9. Overall Statistics for Base Classifiers 





Best KNN 


Best PkNN 




Ltree 


Proto 


Non-Normal ized Accuracy 


86.7% 


79.2% 


msrmsffm 




71.0% 


Non-Normalized St. Dev. 


9.1% 


11.8% 












79.2% 






mmwm 


Normalized St. Dev. 


4.8% 


7.9% 









4 Conclusions 

Selective superiority has often been a problem for single (base) classifiers. It is, in 
part, one of the motivations for developing meta-classifier techniques. The effort to 
reduce bias and variance errors is another. As the results indicate, the best meta- 
classifier is never worse than the best base classifier. In this study, neither KNN 
nor C4.5 are ever significantly worse than the best base classifier. Even the BP and 
PV methods are only significantly worse in 3 out of 20 cases. Further, the meta- 
classifiers are more robust compared to the base classifiers. Their overall accuracies 
rank in the top six while their standard deviations are among the lowest. However, 
despite their successes, the meta-classifiers are not able to eliminate the problem of 
selective superiority. Results indicate that while the meta-classifiers are often as good 
as the best base classifier, the meta-classifiers often have significant differences in 
performance amongst themselves. This suggests that there is a need for an efficient 
method to determine when a particular meta-classifier should be used. 

To achieve this goal, a better understanding of the interrelations of the base 
classifiers needs to be achieved. Merz [10] provides a combining method that 
evaluates the base classifiers on their ability to correctly identify particular categories 
and seeks to reduce correlated errors. However, the best possible collection of base 
classifiers may not ultimately lead to success, unless the combining method is capable 
of efficiently utilizing the base classifiers. To this end, a method for describing data 
sets needs to be developed as well as a technique for applying this to determine which 
classification method would be most appropriate. 



References 

1. Aha, D. A., D. Kibler, and M. K. Albert, “Instance-Based Learning 
Algorithms”, Machine Learning, 6 (1991), 37-66 

Akkus, A. and H. A. Guvenir, “K Nearest Neighbor Classification On 
Feature Projections,” Proceedings of The 13^^ International Conference On 
Machine Learning, July 3-6, 1996, pp. 12-19. 



2. 















442 



Ryan Benton et al. 



3. Brodley, C. E., “Recursive Automatic Bias Selection for Classifier 
Construction”, Machine Learning, 20, 63-95 (1995) 

4. Dietterich, T. G., and Kong E. B., Machine Learning Bias, Statistical Bias, 
and Variance of Decision Tree Algorithms (Manuscript), 1995. 

5. Gama, J. and P. Brazdil, “Characterization of Classification Algorithms,” 
Seventh Portuguese Conference on Artificial Intelligence, 1995, pp. 189- 
200 . 

6. Gama, J., and P. Brazdil, “Linear Tree”, Intelligent Data Analysis, 3 (1999), 

p. 1-22. 

7. King, R. D., C. Feng, and A. Sutherland, “Statlog: Comparison Of 
Classification Algorithms On Large Real-World Problems”, Applied 
Artificial Intelligence, 9, (1995), 289-333 

8. Kubat, M., and M. Cooperson, Jr., “Initializing RBF-Networks with Small 
Subsets of Training Examples”, Proceedings of the 16^^ National Conference 
on Artificial Intelligence, AAAI=99. July 18-22, 1999, pp. 188-193. 

9. Merz, C. J. and P. M. Murphy, UCI Repository of Machine Learning 
Databases [www.ics.uci.edu/~mlearn/MLRepository.html], Irvine, CA, 
University of California, Department of Information and Computer Science, 
1998. 

10. Merz, C., “Using Correspondence Analysis to Combine Classifiers”, 
Machine Learning, 36 (1999), 33-58. 

11. Quinlan, R. J., C4.5: Programs for Machine Learning, San Mateo, CA: 
Morgan Kaufmann, 1993. 

12. Quinlan, Ross J., “Induction of Decision Trees”, Machine Learning, 1 
(1986), p. 91-106. 

13. Rendell, Larry and Howard Cho, “Empirical Learning As A Function Of 
Concept Character,” Machine Learning, 5 (1990), 267-296 

14. Rumelhart, D. E., and J. L. McClelland, Parallel Distributed Processing: 
Exploration in the Microstructure of Cognition., Cambridge, MA: MIT 
Press. 

15. Salzberg, Steven, Arthur L. Delcher, David Heath, and Simon Kasif, “Best- 
Case Results For Nearest-Neighbor Learning”, IEEE Transactions on 
Pattern Analysis and Machine Intelligence, 17 (1995), 599-608 

16. Schapire, R. E., “The Strength of Weak Learnability”, Machine Learning, 5 
(1990), 197-227. 

17. Tveter, D., The Pattern Recognition Basis of AT Neural Networking 
Software [www.dontveter.com/nnsoft/nnsoft.html], 1999. 

18. Wilson, D. Randall, Prototype Styles of Generalization, (1994), [Brigham 
Young University, Department of Computer Science, 39 pages]. (Master’s 
Thesis) 

19. Wolpert, D. H., “Stacked Generalization”, Neural Networks, 5 (1992), 241- 
259. 




The Formal Specification and Implementation 
of a Modest First Order Temporal Logic 



Sharad Sachdev^ and Andre TrndeP 



^ Nortel Networks 

P.O. 3511, Station C, Ottawa, Ontario KlY 4H7, Canada 
ssachdev@nortelnetworks . com 
^ Jodrey School of Computer Science 
Acadia University, Wolfville, Nova Scotia, BOP 1X0, Canada 

andre . trudelOacadiau . ca 



Abstract. We present a formally specified first order temporal logic. 
We give its syntax, semantics, and describe its implementation using 
the Eclipse constraint logic programming language. The main feature of 
the implementation is a graphical user interface. Because color coded 
symbols and graphs are used, the interface assumes no logical knowledge 
on the part of the user. 



1 Introduction 

Every aspect of the world around us changes with time. Therefore if we are to use 
a computer to represent and reason about the real world, we must take time into 
account. A popular method in AI for doing this is to define a first order logic. 
These logics should have a precise syntax and semantics associated with them. 
But, to be truly useful, say for an expert system builder, an implementation for 
the logic is also needed. 

Table 1 lists some of the most popular first order temporal logics in AI that 



Logic 


Syntax & Semantics 


Implementation 


Situation Calculus [8] 


Yes 


Yes 


Kowalski [6] 


Yes 


Yes 


McDermott [9] 


No 


7 


Allen [1] 


No 


Yes 


Galton [3] 


No 


7 


Shoham [10] 


Yes 


No 


BTK [2] 


Yes 


No 


Event Calc. [7] 


No 


Yes 



Table 1. Popular temporal logics 

have appeared in the past three decades. Note that only the two simplest of the 
logics, McCarthy’s situation calculus and its reified version by Kowalski, have a 
syntax, semantics, and implementation. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 443-452, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



444 



Sharad Sachdev and Andre Trudel 



We present a formally specified first order temporal logic. We begin by giving 
an overview of the logic. We then give its syntax, semantics, and axiomatization. 
We conclude with a description of the implementation. The main feature of the 
implementation is a logic independent graphical user interface. 

2 Logic Overview 

Point based information such as “the book is on the table at time point 3” is 
represented as: pomt(3, on{hook, table) ^ true). Negative information at a point is 
represented by changing the ’’true” to ’’false” . For example, “John is not running 
at time point tg”: point {tg^ running (John), false). 

We use duration of truth to represent interval based information. For exam- 
ple, if the book is on the table between times 3 and 7, then the duration of truth 
of the book being on the table over this interval is 4: 

mterua/(3, 7, on{book, table) ^ 4). 

The example “John ran a while between times t^ and ts” is true if and only if 
the duration of truth over the interval (ts, t^) is non-zero: 

interval (t^^tg^ running {J ohn) ^ X) A X > 0. 

“John ran for a total of two time units between times 1 and 10” : 

interval{l, 10, running {J ohn) , 2). 

It may not be the case that John ran for two consecutive time units. Note that 
we cannot derive anything about individual points in the interval (1,10). 

Interval based information is represented in terms of duration of truth. Du- 
ration of truth is measured at the point level. Point based information is treated 
as 0-1 valued functions. 1 represents true and 0 for false. The area under the 0-1 
function is equal to the duration of truth. We calculate the area by integrating 
the function. See [13] for a generalization to quantitative information. 

3 Formal Specification 

3.1 Syntax 

Given a set of constant symbols C, variable symbols V, and function symbols F, 
terms are defined as: 

— All members of C and V are terms. 

— If ri,r 2 are terms, then (ri + r 2 ), (ri — r 2 ), and (ri x r 2 ) are terms. 

Well- formed formulas (wffs) are defined as: 

— If 7Ti and 7T2 are terms then tti < 7T2, tti < 7T2, tti > 7T2, tti > 7T2, and tti = 7T2 
are wffs. 



Specification and Implementation of a Modest First Order Temporal Logic 445 



- If TTi, 7T2, 7T3, ri, . . . , are terms, f ^ then 
mterm/(7ri,7T2,/(ri, . . . ,rn),7Ts), point { tt i, f{ri, . . .,rn),true), and 
point^TTi^ f{ri , . . . , r^), false) are wffs. 

- If 01, 02 are wffs, and z G V then [0i A 02], [0i V 02], [0i ^ 02], [0i ^ 02], 

[^0i], . 0i], and . 0i] are wffs. 

When there is no ambiguity, parentheses and square brackets are sometimes 
omitted. 

3.2 Semantics 

The semantic domain or ontology is IR. An interpretation is a tuple I = ( MC, 
SF, MF ) where: 

- MC: C ^ iR. 

- SF is a set of piece wise continuous 0-1 functions. Each element of SF is a 
function from IR^ to {0, 1} for some n. 

- MF: F ^ SF. 

A variable assignment is a function VA: V \-^ IR. The function TA assigns an 
element of IR to each term as follows: 

- If X G C then TA(x) = MC(x). 

- If X G V then TA(x) = VA(x). 

- If ri,r 2 are terms, then: 

TA((n+r2)) = TA(n) +TA(r2), 

TA((n-r2)) = TA(n) - TA(r2), 

TA((n X X 2 )) = TA(n) x TA(r 2 ). 

The interpretation I = ( MC, SF, MF ) and variable assignment VA satisfy a 
formula p (written \j p [VA]) under the following conditions: 

- \j interval{wi,Tr2,f{ri,...,rn),TT3) iS 

/tT(S MF(/)(TA(n),. . .,TA{rn),t)dt = TA(7T3). Note that the above defi- 
nite integral is always defined because the integrand is a piece wise contin- 
uous function. 

- \j point(7T, /(ri,...,r„),trwe) iff MF(/)(TA(ri),...,TA(r„),TA(7r)) = 1. 

- \j pomt(7T, /(ri,...,r„),/a?se) iff MF(/)(TA(ri),...,TA(r„),TA(7r)) = 0. 

- 7Tl < 7r2[VA] iff TA(7 Ti) < TA(7T2). 

- ly 7Ti <7T2[VA] iff TA(7 Ti) < TA(7T2). 

- ^ 7Ti > 7T2[VA] iff TA(7 Ti) > TA(7T2). 

- ly 7Ti > 7T2[VA] iff TA(7 Ti) > TA(7T2). 

- ^ 7Ti = 7T2[VA] iff TA(7 Ti) = TA(7T2). 

- ty [iy9i A (^ 2 ] [VA] iff \j <yi[VA] and [y v? 2 [VA]. 

- \j ['y^’i V y) 2 ] [VA] iff \j (/?i[VA] or ^ <^ 2 [VA]. 



446 



Sharad Sachdev and Andre Trudel 



- \j [¥^i ^ </:> 2 ][VA] iff \j [-^Lpi V i/:J 2 ][VA]. 

“ Vp'^ ^ 9 ^ 2 ] [VA] iff \j [ipi 9 J 2 ][VA] and \j [ 9^2 — > </^i][VA]. 

- ^ [-¥^][VA] iff ^ ¥^[VA]. 

— \iz . (/p][VA] iff 1 = (/:)[VA'] for all VA' that agree with VA everywhere 
except possibly on z. 

— [j [3^ . (^][VA] iff [j (^[VA'] for some VA' that agrees with VA everywhere 
except possibly on z. 

3.3 Axioms 

We have interval based axiom schemas which capture basic integral properties: 
\/A^B. interval {A^ f, B — A) ^ [\/t. A < t < B ^ point{t, fA'^^ue)] ( 1 ) 

VA, B. interval{A^ B^ /, 0 ) ^ [Vt. A < t < B ^ point{t, /, false)] ( 2 ) 

VX, y. interval {X, YJ,Y - X) ^ 

[VA, B. X <A<B <Y ^ interval{A, BJ,B- A)] (3) 

W, y interval{X, Y, /, 0) ^ [VA, B. X < A < B <Y ^ interval{A, B, /, 0)] 

(4) 

yA,B,X,Y,Z. interval{X,YJ,A)A 
intervaliY, Z, /, B) interval{X, Z, f,A-\- B) (5) 

yA^B^Q^X^Y^Z. interval [A^B^ f^Q) A interval{X,Y, f, Z) 

A A < X <Y < B A Q < B - A 

^ 0 < y < y - X A Q - {{B - A) - {Y - X)) < Z <Q (6) 

Axioms (1,3) specify that if something is true over an interval, then it is also true 
over every subinterval and point within the interval. Similarly for false over an 
interval (i.e., axioms (2,4)). Axiom (5) captures the additive property of adjacent 
intervals. Restrictions are placed on subintervals in axiom ( 6 ) when information 
is not true over a whole superinterval. 

4 Implementation 

A natural choice for implementing a first order logic is Prolog. Unfortunately, 
Prolog is not powerful enough to implement the logic described in this paper. 
For example, a reasonable Prolog representation of “Pll be home after 5pm” is: 



point{T^atHomeXTue) :- T > 5. 



Specification and Implementation of a Modest First Order Temporal Logic 447 



The query “will I be home at 10pm?” (i.e., ?- point ( 10, atHome, true) succeeds. 
But, the query “will I be home after 10pm?” (i.e., ?- point (X,atHome, true) & 
X > 10) fails. The expected answer is yes. Failure occurs because in order to 
prove point(X,atHome,true) we must prove the sub-goal X > 5 which fails. 
Both sides of the greater than sign must be bound in order for it to succeed 
(e.g., 6 > 5). Prolog is not capable of associating a constraint with a variable 
(e.g., point (X,atHome, true) should succeed with the constraint that X > 5). 

It is difficult, but possible, to add constraints to Prolog’s variables. For ex- 
ample in [11], calls are made to Maple from within Prolog to maintain and 
solve systems of inequalities related to the variables. A preferable solution, is 
to use constraint logic programming (CLP). As the name implies, CLP is the 
combination of logic programming and variable constraints. 

BNR Prolog is an example of a CLP system which handles constraints over 
intervals. See [12] for the description of a temporal logic implemented in BNR 
Prolog. 

IBM’s CLP(R) is a CLP system that allows constraints over the real numbers. 
A formally specified, and CLP(R) implemented temporal logic, which supports 
certain types of uncertainty is described in [4]. 

For the logic described herein, we chose Eclipse as the CLP implementation 
language. The system’s components are shown in figure 1. Specific information 



Inference Engine 



User 



GUI 



Translator 



Knowledge Base 



Fig. 1. System design 



about a particular problem domain is stored in the knowledge base as Eclipse 
clauses. An example is given in appendix B. 

The inference engine implements the logic’s axioms. See appendix A for a 
printout of the code. To pose a query, the user must enter “int” and “pt” instead 
of “interval” and “point” . The abbreviations are used in order to avoid infinite 
loops since point and interval information are recursively defined in terms of one 
another, “int” and “pt” refer to clauses in the inference engine, and “interval” 
and “point” to the knowledge base. 

The major drawback with using the inference engine directly is that the user 
must be familiar with the logic’s syntax, semantics, and logical formalisms in 
general. It is not intended for the naive user. To overcome this drawback, we 



448 



Sharad Sachdev and Andre Trudel 



implemented a graphical user interface (GUI) which assumes no mathematical 
knowledge on the users part. The GUI allows the user to enter, query, and receive 
temporal information using color-coded symbols. 

4.1 GUI 

The GUI provides the environment for representing, entering and querying infor- 
mation. The translator is a set of Tcl instructions, which provides a link between 
the graphical user interface and the inference engine. It maps the information 
entered by the user onto the logical form and stores it as facts in the knowledge 
base and sends queries to the inference engine. The result of the query is sent 
back to the translator, where it is mapped from logical to user understandable 
graphical form. 



Entering Information An event is defined as a single item of temporal in- 
formation. Every event has a name, description and a temporal component. For 
example, “John had a meeting at 4 p.m.” The event name for this event is “Meet- 
ing” , the event description is “John had a meeting at 4 p.m.” and the temporal 
component is that it is true at 4 p.m. There are different categories of events 
that can be represented: 

— Point event: Events which occur at precise points. For example, John called 
at 2:30 a.m., 4:45 a.m., 6:15 a.m. and 12 p.m. 

— Limitless Event: An event that started at some unknown time in the past and 
continues until an unknown time in the future. For example, it was raining 
throughout the day. 

— Fixed Event: Events which have a precise starting and ending point. For 
example, I went jogging between 6 a.m. and 8 a.m. 

— FixedLeft Event: An event which starts at a known fixed point and continues 
into the future. For example, the basketball game started at 9 a.m. and 
finished in the afternoon. 

— FixedRight Event: An event that started at some unknown time in the past 
and has a fixed ending point. For example, the snow storm started last night 
and ended today at 10 a.m. 

It is possible to represent events that require the use of more than one type of 
symbolic icon. For example. 

The telephone switch had its peak load at 4 a.m., 6 a.m., 10 a.m., 11:30 
a.m. and between 10:00 till 11:30 a.m. 

This example requires the use of “Point Event” and “Fixed Event” to capture 
the event details. The actual input of this example is shown in the first screen 
shot in appendix D. The user must enter the text in the boxes at the top of the 
window and drag the appropriate icons to the time bar. 



Specification and Implementation of a Modest First Order Temporal Logic 449 



Querying the System The user can perform the following queries: 

— Is an event true at the given time? 

— What is true about an event? 

The layout of the query window is similar to the enter information window. 
Providing consistency among different windows is essential to the design of the 
graphical user interface. 

Assume the knowledge base contains the fact that it has been raining all day. 
The user can use the second window in appendix D to find out if it was raining 
between 4 and Tam. The user first selects the “Rain” event from the dropdown 
list box and confirms his/her selection by clicking on the “OK” button. The 
user then moves the “Fixed Event” icon to the time bar. Finally, clicking on the 
“Query” icon, displays a dialog window showing the time at which the event is 
true. 

A user can also click on “What’s True” to find out when a selected event is 
true. For example in the third window in appendix D, the event “Phone calls” 
is true at points 2:30 a.m., 4:45 a.m., 6:15 a.m. and 12 p.m. 



User Evaluation We recruited novice, intermediate and expert computer users 
to evaluate the GUI. The conclusions drawn from the results of the user satis- 
faction questionnaire are: 

— The general reaction to the interface was favorable, although the flexibility 
was not highly regarded. 

— Icons are only appreciated by some users, hence an alternative should be 
provided. 

— The error messages are adequate and useful in helping the users to perform 
the correct procedures. 

— Learning to operate the interface is simple and straightforward. Understand- 
ing the functionality takes time. 

— The on-line help is beneficial to the user. 

— The interface can be used by both novices and experienced users efficiently. 

— Correcting mistakes should be as easy as possible. 



Summary The GUI allows the user to enter, query, and represent temporal in- 
formation using color-coded symbols. The user does not have to be familiar with 
the particular logic used by the implementation. In fact, the underlying tempo- 
ral representation system can be replaced by another. The translator component 
would need to be updated. No changes would be required in the GUI. 

Following the design of the interface and the background research that was 
required the following conclusions were drawn: 

— The inference engine and knowledge base of the system are kept transparent 
to the user. 



450 



Sharad Sachdev and Andre Trudel 



— The loose coupling between the implementation and interface makes the 
design more robust. One can change the underlying implementation without 
changing the graphical user interface. 

— The evaluation of a user interface is a valuable process in making the design 
decisions that improve user satisfaction and quality of the interface. 

— The user does not have to be familiar with the particular logic used by the 
implementation. It is easy to use the GUI for entering, representing and 
querying information compared to using Eclipse and the logic directly. 



Future Work The functionality of the GUI can be improved in several areas: 

— Handle more categories of events. For example, John played squash and 
tennis for equal amounts of time between 5 p.m. and 7 p.m. 

— Handle true, false and unknown information. For example, Henry was in 
Halifax yesterday but he will not be in Halifax today. We do not know 
where he will be tomorrow. 

— Handle conjunction and disjunction. For example, I was either at Tim Hor- 
tons or the Goffee Merchant for lunch. 

— Handle implications. For example. My dog sits whenever I whistle. 

— Handle temporal constraints between events [5]. For example, the passengers 
disembarked after the plane landed. 

— Add a feature to delete or modify the events already entered into the system. 

5 Conclusions 

Few temporal logics in AI are formally specified with a syntax and semantics, and 
in addition are implemented. To be truly useful, all three of these components 
must be present. We presented such a qualitative temporal logic. 

To make the logic useful to a wider audience, we implemented a GUI. The 
GUI assumes no logical knowledge on the part of the user, and uses color coded 
symbols and graphs for input. To date, the GUI is equivalent to the propositional 
version of the logic. We plan to extend its capabilities in the future. 

The system is designed so that the GUI and logical implementation are trans- 
parent to one another. Either one can be replaced. 

Future work involves applying the implementation to a real world problem. 
One possibility is a meteorological problem domain. 

The unique feature of this work is a formally specified temporal logic imple- 
mented with a GUI. 

Acknowledgements 

The first author was supported by a TARA graduate student scholarship. The 
second author is supported by an NSERG research grant. 



Specification and Implementation of a Modest First Order Temporal Logic 451 



References 

1. J. F. Allen: Towards a General Theory of Action and Time, Artificial Intelligence 
23 (2), (1984), p. 123-154. 443 

2. F. Bacchus, J. Tenenberg and J. A. Koomen: A Non-Reihed Temporal Logic. First 
International Conference on Principles of Knowledge Representation and Reasoning, 
Toronto, Canada, (1989), p. 2-10. 443 

3. A. Gallon: A Critical Examination of Allen’s Theory of Action and Time. Artihcial 
Intelligence 42 (2-3), (1990), p. 159-188. 443 

4. E. Ho and A. Trudel: The specification and implementation of a hrst order logic for 
uncertain temporal domains. The Tenth Canadian Artihcial Intelligence Conference 
(AF94), May 16-20, Banff, Alberta, Canada, (1994), p. 205-212. 447 

5. L. Hoebel, W. Lorensen, and K. Martin: Integrating graphics and abstract data 
to visualize temporal constraints. Sigart Bulletin, ACM Press, 9 (3-4), (1998), p. 
18-23. 450 

6. R. A. Kowalski: Logic for problem solving, Elsevier North Holland, New York, 
(1979). 443 

7. R. A. Kowalski, and M. Sergot: A Logic-based Calculus of Events. New Generation 
Computing 4, (1986), p. 67-95. 443 

8. J. McCarthy: Programs with Common Sense. Appears in Readings in Knowledge 
Representation, Morgan Kaufmann, R. J. Brachman and H. J. Levesque editors, 
Los Altos, USA, (1985), p. 299-307. 443 

9. D. V. McDermott: A temporal logic for reasoning about processes and plans. Cog- 
nitive Science 6 (2), (1982), p. 101-155. 443 

10. Y. Shoham: Reasoning about Change. The MIT Press, Massachusetts, (1988). 443 

11. A. Trudel: Representing and reasoning about a dynamic world. PhD. Thesis, Uni- 
versity of Waterloo, Waterloo, Canada, (1990). 447 

12. A. Trudel: An implementation of a temporal logic. The Eourth UNB Artihcial 
Intelligence Symposium, Sept 20-21, Univ. of New Brunswick, Fredericton, NB, 
Canada, (1991), p. 401-411. 447 

13. A. Trudel: A temporal knowledge representation approach based on elementary 
calculus. Computational Intelligence 13 (4), (1997), p. 465-485. 444 



Appendix A: Inference Engine 



use_module (library (fd) ) . 

7o7o7o7o7o7o7o7o7o7o7o7oInference Engine7o7o7o7o7o7o7o7o7o7o7o7o7o7o7o 

7o To eliminate infinite loops in the code, the user types in 

7o pt(t,f,x) for point (t,f,x) and 

7o int(a,b,f,x) for interval (a,b , f ,x) . 



7o solve directly 
pt(T,F,X) point(T,F,X) . 

7o Axiom 1 

7o F is true throughout some interval containing T. 
pt(T,F,true) A #< T, B #> T, C #= B-A, 
interval (A , B , F , C) . 



452 



Sharad Sachdev and Andre Trudel 



7o Axiom 2 

7o F is false throughout some interval containing T. 
pt (T,F, false) A #< T, B #> T, interval(A,B,F,0) . 

7o solve directly 

int(A,B,F,X) interval (A, B,F,X) , !. 

7o Axiom 3 

7o F is true over a super-interval of (A,B) 
int(A,B,F,C) X #<= A, Y #>= B, Z #= Y-X, 
interval (X,Y,F, Z) , C #= B-A. 

7o Axiom 4 

7o F is false over a super-interval of (A,B) 
int(A,B,F,0) :- X #<= A, Y #>= B, interval (X,Y,F, 0) . 

7o Axiom 5: Sub-divide the interval 

int(A,B,F,C) :- X #<= A, Y #> A, Y #< B, Z #= Y-X, 

interval (X,Y,F, Z) , int (Y,B,F,Temp) , C #= Y - A + Temp. 
int(A,B,F,C) :- X #<= A, Y #> A, Y #< B, 
interval (X , Y , F , 0) , int (Y , B , F , C) . 

7o Axiom 6 

7o Place constraints on the value of the interval. 
int(X,Y,F,Z) :- interval (A, B,F, Q) , 

A #<= X, X #< Y, Y #<= B, Q #<= B-A, 

Z #>= Q-((B-A)-(Y-X)) , Z #>= 0, Z #<= Q, Z #<= Y-X. 

Appendix B: Knowledge Base^ 

Appendix C: Sample Queries Using the Inference Engine 
and Knowledge Base 

Appendix D: GUI Screen Shots 



Due to space limitations, the remaining appendices have been omitted. 



1 



Determining Effective Military Decisive Points 
through Knowledge-Rich Case-Based Reasoning 



David E. Moriarty 

use Information Sciences Institute 
4676 Admiralty Way, Marina del Rey, CA 90292 
moriarty@isi . edu 



Abstract. This paper describes our efforts in the DARPA High Perfor- 
mance Knowledge Bases (HPKB) program to solve a difficult military 
planning problem. We describe a case-based reasoning solution within a 
knowledge- rich environment that leverages both examples and rule-based 
knowledge. Our solution uses an innovative combination of nearest neigh- 
bor, neural networks, and natural deduction. Results from a DARPA- 
sponsored evaluation and our own experiments demonstrate the utility 
of our solution and the contribution of the technology components. 



1 Introduction 

The DARPA High Performance Knowledge Bases (HPKB) program has detailed 
a number of military problems that challenge the state of the art in artificial 
intelligence. These problems cover areas where automated solutions could greatly 
benefit the military, but where automation has proven difficult with traditional 
methods. This paper describes our solution to one of these problems. 

One of the most difficult tasks in military planning is determining an ap- 
propriate point of focus, called a decisive point. Military experts believe that 
identifying effective decisive points is an art, where proficiency comes through 
experience (Jones, 1999). Experts cannot verbalize their reasoning process and 
thus it has proven difficult to automate with traditional expert systems. 

Our solution avoids this knowledge acquisition bottleneck by acquiring knowl- 
edge from examples rather than expert rules. Experts can provide examples or 
case histories of decisive points. These examples can be used as a case base for a 
case-based reasoner or as a set of training examples for an inductive learning al- 
gorithm. In both situations, the examples provide significant knowledge content. 
This paper focuses on the case-based reasoning (CBR) approach. 

The decisive point problem presents two important challenges to CBR: how 
to manage relational or “structured” case knowledge and how to fuse expert, 
rule-based knowledge. The system presented in this paper addresses each of 
these challenges and represents a novel contribution to CBR. The reasoner is 
part of the KILTER learning toolset within the PowerLoom knowledge repre- 
sentation system and uses an innovative combination of nearest neighbor, graph 
search, neural networks, and natural deduction to build, match, and reason with 
relational case knowledge. Since the reasoner is implemented within PowerLoom, 
it also exploits any existing rule-based knowledge about the problem. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 453-462, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



454 David E. Moriarty 



2 The Decisive Point Problem 

2.1 Problem Description 

A course of action (COA) is defined by the US military as a sketchy plan that 
describes how a military unit is to carry out its mission. COAs are generated 
by a military planning staff through the military decision-making process, where 
numerous competing COAs are developed an analyzed. All COAs have a point 
of focus, called the decisive point, where the military directs its combat effort. 
Decisive points normally refer to a feature on a map such as a geographic region 
or a a specific military unit. The most effective decisive points match a military 
strength against an enemy weakness. 

Military planners place great emphasis on understanding and exploiting the 
decisive points. Unfortunately, military experts agree that there is no general 
procedure for determining effective decisive points (Jones, 1999). Decisive points 
are normally chosen from a “gut feeling” rather than by following strict military 
doctrine. Military students become adept at recognizing good decisive points 
simply through trial and error using feedback on whether the missions were 
accomplished. Unfortunately, students and experts find it difficult to translate 
their knowledge into a concise set of rules. 

While experts cannot provide a complete set of rules to describe their reason- 
ing, they can provide knowledge fragments that are incomplete, but still useful. 
Some of this knowledge is domain general such as knowledge about geography 
and geographic relations and some is specific to decisive points. One example 
of a domain specific knowledge fragment is the fact that linear features such as 
rivers and borders are not good decisive points because they do not provide a 
single point of focus. An example of domain general knowledge is the fact that 
if X is east of y, then y is west oi x. In addition to knowledge fragments, experts 
can provide examples of decisive point choices along with a measure of goodness. 

The challenge problem is as follows. Given several knowledge fragments from 
experts and a set of decisive point cases, build a system to evaluate future decisive 
points. The system should input all scenario information and output a ranking 
of the best decisive points. 

2.2 A Case-Based Reasoning Solution 

Given the lack of existing knowledge and the availability of decisive point ex- 
amples, a machine learning approach is appropriate. Case-based reasoning is a 
branch of machine learning (sometimes called lazy learning) where solutions to 
previous problems are adapted and reused in similar future problems. In GBR, 
each example is explicitly stored in a case base. Once a query example is pre- 
sented, the reasoner passes through two phases: matching and adaptation. In the 
matching phase, the reasoner compares the query example to each stored case 
and computes a distance measure. In the adaptation phase, the solutions from 
the cases with the lowest distance are combined and adapted to fit the query. 

We adopted a case-based approach over other machine learning methods for 
several reasons. First, military experts are very receptable to case-based reason- 
ing because it often reflects their own reasoning. Second, case-based reasoning 



Determining Effective Military Decisive Points 



455 




Fig. 1. A partial decisive point description. 



provides human-understandable explanations. Along with the answer, a case- 
based approach can note the relevant previous cases. The final motivation is 
that case-based methods do not form an explicit representation of the hypothe- 
sis and thus have almost no computational overhead at training time. 

2.3 Technical Hurdles 

We identified two major challenges in applying case-based reasoning to the the 
decisive point challenge problem: managing relational case knowledge and fusing 
rule-based and example-based knowledge. Each challenge is outlined below. 

Relational Case Knowledge. When evaluating a potential decisive point, 
experts rely on several different sources of knowledge. They consider the overall 
mission and its objectives, knowledge about the terrain and other geographic 
features, and knowledge about specific military units. Since these sources are 
critical in evaluating a decisive point, they should be included in the decisive 
point case description. 

It is difficult to imagine a non-relational representation for this knowledge. 
Consider knowledge about the terrain, which describes geographical features 
such as rivers, mountains, and roads. These objects have properties such as 
position, and length, but the most interesting characteristics are how they are 
related to one another. For example, a mountain may be between two military 
units, which indicates that these units are blocked from each other. This type 
of knowledge requires relations between instances and resembles a structured 
hypergraph where nodes reflect instances and arcs represent relations. Figure 1 
gives an example of the relational knowledge in decisive point cases. 

Unfortunately, almost all implementations of case-based reasoning operate on 
a flat set of feature values and cannot match structured cases (Gebhardt, 1997; 
Kolodner, 1993). A case-based solution to the decisive point problem therefore 
requires either an algorithm to translate the relational knowledge into a feature 
vector representation or a case-matcher that can directly compare structured 
cases. In section 3.2, we describe a case-based system that adopts both strategies. 

A second problem with relational case knowledge concerns the dimensions 
along which two cases are compared. Traditional case-based methods assume 



456 David E. Moriarty 



that this information is given a priori (e.g., features in the feature vector). Un- 
fortunately, when reasoning with relational knowledge we are normally not af- 
forded this luxury. Decisive point cases can be described using any number of 
instances, properties, and relations. There is no single convention for describing 
a decisive point case. Since we do not have a set of common properties, we must 
devise another strategy for generating the dimensions for comparison. 

Combining Rule-Based Knowledge with Examples. Traditional case- 
based reasoning methods utilize one source of knowledge: the case base. Case- 
based reasoning, along with most machine learning methods, normally ignore 
existing knowledge about the domain. The rationale is twofold. First, in most 
domains training data is plentiful and good performance can be achieved with 
examples alone. Second, it is not obvious how these methods could exploit such 
knowledge. 

In the HPKB challenge problem, the training data was rather sparse. We 
were given only 50 cases of previously identified decisive points, which is a lim- 
ited sampling of the problem space. Thus, to perform well, one cannot rely on 
examples alone. Unfortunately, little work has been done to develop methods 
that combine both rule-based and example-based knowledge. 

3 Knowledge-Rich Case-Based Reasoning 

To overcome the above challenges, we implemented a case-based reasoner in the 
PowerLoom knowledge representation system. The reasoner combines nearest 
neighbor, neural networks, graph search, and natural deduction to reason with 
relational, example-based knowledge and rule-based knowledge. We characterize 
this approach as knowledge-rich case-based reasoning. 

Our decisive point solution can be summarized as follows. Cases are stored 
in PowerLoom as a set of relational facts. Criteria for comparing two cases are 
generated using a graph search algorithm called GSA. PowerLoom then maps 
each case into the criteria and constructs match vectors. The match vectors are 
fed into a neural network which is trained to build semantic signatures for each 
case. The signatures are stored in the case base to be matched by queries. 

Given a new scenario, the algorithm evaluates each map feature (e.g., regions, 
units, etc.) as a potential decisive point. Given a map feature, the algorithm 
queries PowerLoom to determine if any rules match this feature and can infer 
goodness. If goodness cannot be inferred, the algorithm invokes the case-based 
reasoner. The trained neural network computes a signature for the map feature 
which is subsequently compared to each case signature. The goodness of the map 
feature is the average of the goodness of the top three closest matching cases. 
Once all features have been evaluated, the top three points are returned. 

The case-based algorithm follows a general k-nearest neighbor strategy with 
several innovations for managing structured cases and existing rule knowledge. 
The remainder of this section describes these enhancements. 

3.1 Generating Match Criteria 

To judge similarity between two cases, one must know the dimensions along 
which cases may vary. Unfortunately, as described above, such criteria is not 



Determining Effective Military Decisive Points 457 




Fig. 2. A generalized case. Variables are substituted for all non- leaf instances. 

obvious when using relational case representations. This section describes an 
algorithm called GSA (Generalized Structural Assertions) that uses a heuristic 
approach to generate match criteria from a relational case-base. GSA capitalizes 
on one simple idea: the most important criteria for judging similarity between 
any two cases is the the set of facts that have been used to describe the cases. 

The GSA algorithm has two main phases. First, GSA collects and generalizes 
the facts that describe each case. GSA traverses the structure of each case to 
a given depth and records all links. The algorithm follows that of Emde (1996) 
and is as follows. Starting with the root instance of the case, generate and record 
all asserted facts. Repeat for each new instance found in the new facts until a 
specified depth limit d is reached. The knowledge structure in figure 1 represents 
the facts at depth limit two for a case rooted at EAl. 

Second, GSA generalizes the facts, creating a generalized knowledge strueture. 
Generalization broadens the scope of the assertions to apply to many cases. GSA 
generalizes by substituting variables for instances. Specifically, variables are sub- 
stituted for all instances linked to the root instances with less than d relations 
(i.e., all non- leaf instances). Figure 2 shows the generalization of the figure 1 case. 
By variablizing, other cases can be matched into the structure simply by gener- 
ating bindings for the variables. For example, a case might be similar to EAl, 
because it is near a river and is part of a protect task. Without generalization, 
a matching case would have to be near Riverb and part of Missionl. 

The second stage of GSA concerns folding in generalized knowledge struc- 
tures from multiple cases. A naive combination strategy would simply attach 
each structure under a common root node. This strategy, however, ignores over- 
lap between knowledge structures and creates an unnecessarily complicated uni- 
hcation. A better strategy when folding in a new knowledge structure is to only 
add structure that is not present in the unified structure. Unfortunately, finding 
the largest common overlap among knowledge structures is an instance of the 
largest common subgraph problem, which is known to be NP-Hard. Therefore, 
GSA uses a heuristic algorithm to find the common knowledge structure and 
does not guarantee the simplest unification. 

Figure 3 illustrates the GSA structure combination process. GSA starts uni- 
fying at each root node {IX in figure 2) and moves down the graph attempting 
to align variables. The GSA folding algorithm is as follows. First, sort the gen- 
eralized assertions based on the distance from the root node. The distance of an 
assertion from the root is the smallest number of relational links that connect a 



458 David E. Moriarty 




Fig. 3. An example of structure folding in GSA. The figure illustrates folding 
three different knowledge structures into one unified structure. 



single argument in the assertion to the root variable. Sorting ensures that GSA 
unifies all variables close to the root before moving to the deeper variables. 

Second, for each assertion in the sorted structure attempt to find a matching 
assertion among the unmatched assertions in the unified knowledge structure. 
Two assertions match if they contain the same predicate and there is no conflict 
in their arguments. Argument conflicts occur when a common variable appears 
in different argument positions. The result is a set of variable substitutions to 
complete the match. For each match, GSA deletes the matching clause from the 
sorted structure and propagates the variable substitutions through the remaining 
clauses. Finally, GSA adds all unmatched assertions in the sorted knowledge 
structure to the unified knowledge structure. An unmatched assertion is new 
knowledge that is not currently in the unified structure. 

GSA essentially combines all facts from the cases into a single structure. We 
interpret this structure as the relevant criteria to use when comparing two cases. 

3.2 Building Semantic Signatures 

A key problem with relational case knowledge comes at query time, when a query 
must be compared to all cases. The problem is that structured cases are difficult 
and time consuming to compare. Even with the criteria generated by GSA, 
structured case matching entails finding the largest common overlap between 
two graphs, which as mentioned before is an NP-Hard problem. 

One way to reduce the computational overhead is to use a preprocessing step 
to filter out cases that are unlikely to be relevant and thereby reduce the number 
of calls to the structural matcher. The disadvantage of this approach is that it 
introduces a weaker comparison algorithm that may miss important similarities. 

An alternative strategy is to translate the structural cases into representa- 
tions that can be efficiently compared. One efficient representation is a fixed- 
length vector of floating point numbers. Floating point vectors represent points 



Determining Effective Military Decisive Points 459 



Match Criteria 







R2(?X,?V2) 



R1(?V1,?X) 0 



Match Vector 




Neural Network 



R3(?X,?V4) 0 



Case 



R4(?V3,?X) 0 




R1(?V2,?V3) 



Signature 

0.5, 0.8, 0.2 



Fig. 4. Transformation of a structural case into a floating point signature. 

in a multidimensional space and can be compared using simple Euclidean dis- 
tance. The challenge is thus to translate a case represented by logical assertions 
into a continuous vector, while preserving the original semantics. 

The remainder of this section describes an algorithm for performing such a 
transformation using deductive inference and a neural network. The approach, 
shown in figure 4 generates compact, semantically-rich floating point signatures. 

Creating Match Vectors. In the first stage, PowerLoom performs a structural 
match between each case and the match criteria. PowerLoom matches by binding 
variables in the match criteria to instances in the case. Since it is unlikely that a 
case will completely match the match criteria, PowerLoom uses a greedy partial- 
match strategy where it selects bindings that satisfy the greatest number of 
criteria. Thus, it finds the largest overlap between the case and the criteria. 

The result of the match is a set of clauses from the criteria that are satisfied 
and a set that are not. For example in figure 4, only two of five clauses are 
satisfied. By associating a score of 1 and 0 for each satisfied and unsatisfied 
clause respectively, we obtain a pattern over the match criteria. These patterns, 
called match vectors, are binary vectors that represent which criteria are satisfied. 

Neural Network Feature Weighting. The match vectors provide a fixed- 
length representation that captures the semantics of each case. What is missing, 
however, is a measure of importance for each dimension. Clearly, when comparing 
two cases, some criteria should be weighed more than others. Most approaches 
to feature weighting attach a weight to each feature and then sum the weights 
of the matched features. Our approach uses a neural network rather than linear 
combination to compute the match score. 

The goal of the neural network learning module is twofold. First, it distin- 
guishes the relevant features. Second, it computes a compact representation that 
can be easily compared to other cases. The neural network is a 3-layer feedfor- 
ward neural network with sigmoid units in the hidden and output layer. The in- 
put to the network is the match vector, and the output is the expert’s evaluation 
of that case (0.0 to 1.0). The network is trained using standard backpropagation. 



460 David E. Moriarty 



A trained network correctly evaluates each decisive point from its input pat- 
terns within the match criteria. Thus, it has implicitly learned how to weigh 
the input features. Since the network uses a hidden layer, it has also learned 
to translate the binary input into a lower-dimensional continuous space repre- 
sented by the hidden units. The hidden unit activations capture the features of 
the inputs that are important for evaluating decisive points. Our strategy is to 
use these activations as a semantic signature for each case. 

There are several advantages of the neural network signatures. First, they 
are compact representations for each case, which for large case bases signifi- 
cantly reduces the match time. In this problem, the neural networks reduced 
500 dimensional match vectors to 50 dimensions. Second, neural networks can 
can weigh mismatch evidence as well as match evidence. Current linear combina- 
tion strategies only propagate positive match evidence from a feature, which in 
this problem is not always valid. For example, two bridges may match perfectly 
except for who actually controls the bridge. This difference should completely 
change the similarity measure. It is unclear how a set of linear weights could make 
such a large distinction from a single mismatched feature. A neural network can 
separate these examples by mapping them into different hidden vectors. 

3.3 Incorporating Rule Knowledge 

The decisive point algorithm uses existing rule knowledge in two important ways. 
First, it explicitly checks if any existing rule knowledge can infer goodness. If a 
rule exists that covers the map feature, it foregoes the case-base strategy and 
uses the rule. An example of this type of knowledge is the rule: linear features 
are bad decisive points. All rivers and roads match this rule. 

A second way that the decisive point algorithm uses rule knowledge is in 
structural case matching. Recall from the previous section that PowerLoom’s 
partial matcher is used to match each case into the match criteria. Structural 
matching entails finding bindings for variables in the match criteria such that 
as many clauses are satisfied as possible. When satisfying a clause, PowerLoom 
uses any available inference rules to generate a deductive proof. For example, 
suppose a clause in the match criteria specifies East(lX^ but there is no 

corresponding assertion in the case. Suppose that the case does have the assertion 
West{River4:^ Bridgel). Given the general rule East{X^ Y) implies West{Y^ X), 
PowerLoom can infer East{Bridgel^ River Y) and satisfy the above clause. 

Utilizing rule knowledge in case matching compensates for a lack of cases. 
In the previous example, the rule creates a second implicit case from an explicit 
case. There was no case where E ast {Bridge!^ River Y) was explicitly asserted, 
but the rules allowed PowerLoom to infer one. Experiments in the evaluation 
section confirm the importance of rule knowledge when cases are limited. 

4 Evaluation 

To test our approach, we participated in an official HPKB evaluation conducted 
by the Alphatech Corporation. Alphatech provided expert descriptions of 50 de- 
cisive point cases. Each case was modeled using terms from ontologies produced 



Determining Effective Military Decisive Points 461 
Table 1. Results from the HPKB evaluation. Scores range from 0 to 100 



Problem 1 2 3 4 5 

Best 100 100 100 80 100 
Average 100 100 80 40 95 




Fig. 5. Learning curves for different variations of the case-based reasoner. 

within HPKB, which includes a portion of the Cyc knowledge base. The cases in- 
cluded background knowledge about the scenario including the spatial relations 
between map features and military units. 

Alphatech provided five test problems. Table 1 summarizes our performance 
as determined by experts. Our algorithm returns three decisive points and Al- 
phatech reported the quality of both the best of the three and the average of the 
three. The graph shows that in every problem, we returned a quality decisive 
point. The lowest “best” score was 80% in problem 4. This encouraging result 
shows that the reasoner recognizes the best decisive points. 

Unfortunately, the average scores shows that the system was not as discrimi- 
nating with poor decisive points. In problem 4, the system did return the optimal 
decisive point, but it also returned two poor decisive points. From the evaluation 
numbers and expert opinions, it is clear that the system is over generalizing. It 
does not miss the best points, but it does not always reject the bad ones. One 
solution is to provide more negative examples. 

To complement the Alphatech evaluation, we ran additional experiments in- 
house to judge performance and measure the utility of the rule-based knowledge. 
Figure 5 shows the results of several 10- fold cross validation experiments over 
the case base with and without rule knowledge. The learning curves plot the rate 
at which performance increases with the size of the case base. The top curve rep- 
resents the full system with all available rule knowledge. The middle curve plots 
performance without knowledge specific to decisive points such as the fact that 
rivers are not good decisive points. The bottom curve plots system performance 
without domain-general rules such as general geospatial relationships. 



462 David E. Moriarty 



The shape of the curves are somewhat surprising and do not reflect typical 
machine learning curves. The most striking feature is that the learning rate 
actually increases with experience. There is virtually no difference in performance 
with case base sizes of 5 to 30 examples, but after 30 the performance grows 
approximately linearly with the size of the case base. Unfortunately, we have not 
been able to come up with a concrete explanation for the shape of the curves. 
One possible hypothesis is that they are an artifact of the cases. It is unclear 
how representative our case distribution is over the actual problem distribution. 

The results show a signiflcant advantage to case-based reasoning with rule- 
based knowledge. At every level of case knowledge, performance improves with 
rules, which supports our hypothesis that the rules compensate for a lack of 
cases. The knowledge-poor approaches need more cases to achieve the same 
level of performance as the knowledge-rich approach. Interestingly, there was a 
greater drop off without the general rules than without the domain speciflc rules. 
Since general rules apply to more situations, their impact is felt more often and 
thus when they are removed the system suffers a greater performance hit. 

The overall assessment by the military experts is that our approach is the 
most promising solution to the decisive point problem to date. The system is 
currently not strong enough to serve in real planning efforts, but the experts 
agree that currently the limiting factor is the lack of cases. Given more cases 
(especially negative examples) and additional rule-knowledge, the knowledge- 
rich case-based reasoner could provide a valuable military planning tool. 

5 Conclusion 

Determining decisive points is a challenging problem, where the lack of expert 
rules precludes an expert system solution. Case-based reasoning is a promising 
alternative, since knowledge comes from examples rather than rules. However, 
traditional CBR methods are inadequate because they cannot match structured 
cases and do not utilize existing rule knowledge. We addressed each of these 
challenges in a new case-based reasoner within the PowerLoom system. Our 
case-based reasoner combines nearest neighbor, graph search, neural networks, 
and natural deduction to learn match criteria, perform structural matches, and 
incorporate existing knowledge. Experiments in the HPKB program show that 
this approach is more effective than any other decisive point solution to date. 

References 

Emde. Emde, W. (1996). Relational instance-based learning. In Proceedings of the 
13th International Conference on Machine Learning. 

Gebhardt. Gebhardt, E. (1997). Survey on structure-based case retrieval. The Knowl- 
edge Engineering Review^ 41-58. 

Jones. Jones, E. (1999). HPKB course of action challenge problem specification. Tech, 
rep., Alphatech Inc. 

Kolodner. Kolodner, J. L. (1993). Case-Based Reasoning. Morgan Kaufmann. 



A Constraint-Based Approach to Simulate 
Faults in Telecommunication Networks 



Aomar Osmani and Francois Levy 

Laboratoire d’ informat ique de Paris-Nord 
Avenue J.-B. Clement, France-93430 Villetaneuse 

aoOlipn . univ-parisl3 . f r 



Abstract. To study the consequences of fault situations in telecommu- 
nication management networks (TMN), we have proposed a model-based 
diagnosis approach based on the CSP paradigm. First the TMN is mod- 
eled, then a set of breakdown situations is simulated in order to build a 
fault-training base. Finally, a rule-based system is generated and used to 
detect faults in the real system. In this work, we are interested in the 
simulation process. 

The fault simulation comes down to the propagation in the network, of 
the information emitted by the breakdown components. To take into ac- 
count the inaccuracies of the transmissions in the network and the times 
of events processing, we represent temporal knowledge by time intervals. 

The components are modeled by extended transducers. Consequently, the 
behavior of the components is sensitive to the order of happened events 
that have occurred. To reason about these behaviors, a CSF formalism 
is studied. The main aim is not to compute a possible behavior of a 
component but to compute all the behaviors making it possible to build 
a faults-training database. Some polynomial algorithms are proposed to 
computes solutions and to update temporal constraints. 

1 Introduction 

Fault management is a fundamental task in the telecommunication management 
process. The main goal is to collect and interpret alarm messages and failure 
indications from a network element without human intervention. The complexity 
network complexity needs to use Artificial Intelligence techniques to assist the 
operators in fault management tasks. 

Previously, most systems employing Artificial Intelligence techniques for fault 
management were expert systems or production rule systems [4,19]. These for- 
malisms sare till used to partially manage several networks. Typically these 
methods associate sequences of observations to a given fault situation. These 
techniques are interesting to diagnosis systems, which rarely change the config- 
uration or their components. Alternatively various techniques like neural net- 
works [18], Petri networks [1], case-based techniques [8], fault dictionaries [14], 
coding approaches [20] and model-based diagnosis approaches [2,13] have been 
proposed for diagnosis. Incontract to expert systems, the model-based diagnosis 
approaches are known to be particularly more adaptable to evolutive systems 
and give a good explanation of fault situations. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 463—474, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



464 Aomar Osmani and Francois Levy 



We proposed, within the CASPAR project [13]^, a model-based diagnosis 
formalism for fault diagnosis in the largest data switching French network 
(TRANSPAC). Our system is composed of two modules: an off-line module, 
which simulates fault situations in the model of the network and learns their 
characteristics, and on-line module, which supervises the behavior of the net- 
work in order to recognize learned faults. Preliminary results of this project are 
presented in [12,9,13,10]. 




(a) (b) 



Fig. 1. (a) Principles of our approach to diagnose faults (b) Example of management 
network: the TRANSPAC network 

A telecommunication network (TN) is composed of a set of components con- 
nected to each other. The role of the telecommunication management network 
(TMN) is to maintain the TN in good service. The structure of the TMN is 
hierarchical as is shown in the figure 1(b). 

The simulation process of breakdown situations consists of the propagation of 
information emitted by the breakdown components. To take into account the 
transmission delay and uncertainties of data processing in the network, we rep- 
resent temporal knowledge by time intervals. Thus, if a switch takes t units of 
time to process data, this time will be denoted by [t + AT, t — At]. The taking 
into account of such type informations generates for each simulated fault a very 
significant number of all possible components behaviors. 

Section 2 presents briefly the formalism we use to diagnose faults in telecom- 
munication networks. Section 3 introduces the simulation module. Section 4 
describes a simulation module as a CSP problem and we give two algorithms: 
one for computing all possible solutions of the CSP without backtracking while 
the second algorithm generates models of scenarios. 

2 Problem Description 

To develop a supervision systems for detecting faults in a telecommunication 
networks, we propose a model-based diagnosis technique (MBD). The telecom- 
munication network is a dynamic and complex system ensuring various function- 
alities. To deal with the management task, we extract from the network: 

^ CASPAR (Gestion d’Alarmes par Simulation de PAnnes sur Reseaux de 
Telecommunications) is done in the collaboration with the CNET (Centre National 
des Telecommunications), on the CNET/LIPN/IRISA 97 IB 437 project. 



A Constraint-Based Approach to Simulate Faults 465 



— a structural model describing the topology of the network. The structural 
model is defined by a temporal graph Mr{C^L) where C is the collection 
of physical components of the system and L is a set of connections between 
these components; 

— a functional model describing the simplified operation of each element of the 
network. The functional model of each component is defined by a temporal 
and communicating finite-state machines. The automata defines, for each 
component, the possiblesystem states and the messages it can exchange with 
other components. 

Once the model is built, a set of indexed breakdown situations is simulated. 
The simulation task builds a training database associating for each breakdown 
situation. This training database is used to discriminate faults. The result is 
used directly by the telecommunication management network to recognize faults 
in the managed networks. 

3 Simulation 



The simulation of the fault Pi consists of the asynchronous release of the virtual 
component cpi, which simulates primaries causes of the fault Pi. Thus, in the 
case of the figure 2(a), we suppose that components CTi, CMi, CM3 take 
part in the release of the fault P\. To simulate Pi, events are emitted from 
cpi towards the components CTi, CMi, CM3. These components treat the 
events and then propagate the effects to the other components of the network. 
This process continues until all the effects are received by the supervisory center 
SCS. 





Fig. 2. (a) Example of fault situation (b) The simulation graph of the fault Pi. 

3.1 Principle of Simulation 

Let consider {Pi, P2 , . . . , P^} be a set of breakdown situations to be simulated 
and Mr a model of the network. The simulation model Mrs is built by adding 
the virtual components (cpi, cp2, •••, cpn} to Mr (see figure 2(a)). 

Definition 1. A component C is in the active state for the breakdown situa- 
tion P if and only if C is in a path between the component cpi and the supervisor 
center SCS. 

Definition 2. A simulation graph MRSi = {Csi^SCS^ Pi^Asi) associated with 
the fault Pi is a graph containing nodes SCS and cpi and all components in the 
active state for the situation Pi. Asi is a set of arcs connecting the nodes of Csi- 



466 



Aomar Osmani and Francois Levy 



Example 1 The simulation graph corresponding to the example of figure 2(a), 
is given in figure 2(b). In this graph only the active components for fault Pi are 
modeled. 

3.2 Assumptions 

— The simulation graph is a labeled graph made up of the model Mr and the 
virtual components modeling each fault to be simulated; 

— The unique part of the simulation components is to send dated messages 
reproducing the primary causes of fault situations. If no breakdown is sim- 
ulated, the network must be in an at-rest state (i.e., no event circulates in 
the network); 

— The components are modeled by temporal and extended transducers. They 
treat the events one at a time. The processing of events consists of changing 
the current state, sending messages to neighboring components and changing 
the context of the component. In this paper, the components’ behavior is 
considered to be a blackbox; 

— The event propagation from each component to another is done in sequen- 
tially, event by event. The exchange of events between two components is 
not instantaneous. The duration of the propagation is modeled by a time 
interval; 

— The supervisor center(SCS) is a particular component. Its role consists of 
collecting the sequences of events that it receives during the release of break- 
down situations. It does not perform any particular task. 

4 Simulation and Time Representation 

The temporal constraints intervene at two levels in the simulation process: at 
the deduction level (to reason about the component behavior) and at the propa- 
gation level (to reason about messages exchanged between components). In this 
paper we are interested in the temporal constraints at the propagation level. 

To reason about temporal information, and to ensure the completeness for alarm 
sequences relating to the release of a breakdown situation, we complete the simu- 
lation model by a mathematical module, which deals with temporal information. 
This module computes, at the input of each component all consistent order of 
events received from the other components. Figure 3 shows the position of this 
module in the model of a component. 




Fig. 3. The component C receives a set of sequences of events from Ci , . . . , Cm 



A Constraint-Based Approach to Simulate Faults 467 



The component C can receive several events arriving from several components 
(see figure 3). To ensure the completeness or to allow the taking into account of 
the best sequences, we propose to study the case where the propagations could 
occur over an with uncertain durations: the temporal parameters will be modeled 
by intervals. 

4.1 Event Propagation 

Consider the component C of the model Mrs- C receives a sequence of rii events 
{Ei = from each component where i = l..n. In addition, 

E = {El, . . . , En} the set of all events received by C . 

Computing all possible sequences of events of the input of components is equiva- 
lent to computing all the consistent possible orders between elements of E. The 
theoretical complexity of this problem is the same one as that of the propaga- 
tion of the events with precise duration: if all the intersections of the intervals 
associated the events are empty then it exists only one possible sequence. If, on 
the other hand, the intersection of all the couples of intervals associated with the 
events is not empty, the number of sequences is equal to the number of possible 
arrangements of the sequences of events. 



4.2 Temporal Reasoning Formalism 

We associate with each event eij a temporal label lij = lij defines the 

period of time when Cij arrives from the node Ci to the node Cj. 

We propose to solve this problem using the TCSP (Temporal Constraint Satis- 
faction Problems) formalism. Specifically, we will use the points algebra proposed 
by Vilain and Kautz [16]. In this algebra three basic relations are possible: <, = 
and > (i.e., < ej means that the event occurs before the event Cj). A union 

of basic relations represents incomplete knowledge. The points algebra is made 
up of the relations and the fundamental operations of 

converse, intersection and composition. 

Definition 3. The temporal constraint satisfaction problem P is defined by a 
set of variables V = {Vi, . . . , Vn}, where each Vi is defined in the domain 
An instance of V is a tuple of temporal points (ui, . . . , Vn}- The constraints of 
the problem are defined by the relations between the definition domains of the 
variables: a binary constraint between two variables Vi and Vj will be represented 
by the matrix of relations Aij = (a{-\, 5 ^ 22 ) between the convex domain 

and Ij. For example: I~ ^12 ^ 2^1 ^22 ’ 

Definition 4. A Binary constraint Rij between two variables Vi and Vj is a 
disjunction of the relations <,=,>. The interpretation of Rij is: 

WVi G li.yVj G /j, Va G {<, =, >}lViaVj ^ aHRij =a 

To represent and to reason about binary relations we use the constraint network. 
We find the same formalism of reasoning in Dechter, Meiri and Pearl [3], Ladkin 
and Maddux [6] and in van Beek [15]. 



468 



Aomar Osmani and Francois Levy 



Definition 5. A binary constraint network is a directed graph Rc where 
the nodes correspond to the variables of V, and arcs correspond to the binary 
constraints between the pairs of variables. A network Rc is consistent if there 
exists a consistent instantiation of all the variables associated with the nodes 
of Rc. 



Modeling of the Events Scheduling Problem The variables is the set 

of the events {T^n, received by each component C 
during the simulation process of the fault Pi (see the figure 3). The domain of 
each variable Eij is defined by a temporal interval lij = [If- ^I^]. Two constraints 
inherent to the application and independent of the variable domains are defined: 

1 . the events exchanged directly between two components arrive in the order 

of emission: Vi, j; 1 < i < n, 1 < j < — 1; Eij < Eij^i 

2 . the component treats the events sequentially: Vi, j, /c, 1 < i, /c < n, 1 < j < 

in? 1 ^ — ^n? {Eij < E]^i) V {Eij > E}^i) 

The other constraints are extracted from the variable domains by translating 
from the relations between domains towards relations between variables. 



Constraints translation Wi^j^kJ;i^k = = l..inj = l..knC / k; 

(1) If lij nlki then if 1^^ = then Eij < Eki else 

if I~j = then Eij < Eki else EijlEkV, 

(2) If lij n = 0 then if then Eij < Eki else 

if = Wi then Eij < Em- 

Example 2 In figure 4, component C receives one event from component C\, 
two events from C 2 and three events from C 3 . 



time 

Component C ^ 

Component C2 
Component ^31 



^33 



Fig. 4. Example of 



events configuration at the input of the component C 



Definition 6. A scenario St of the variables of V is a sequence denoted 
(Vi, . . . , Vn), (i.e., (Vi) (Vi < 14+i ) St is a complete scenario if and only if the 
number of variables in the sequence is equal to the cardinal of V . We introduce 
the concept of models of a scenario, which makes it possible to give a compact 
representation of a set of scenarios. 

Definition 7. A model of scenarios Sp of variables of V is a sequence 
of sets denoted: {{Vi^ }) whefd:) (Vi)(Vfe)(V/)ijj, = 

Eij is the variable associated with the event eij . 



2 



A Constraint-Based Approach to Simulate Faults 469 



l..n gC) 

(2) {\Ji = l..n)(Vj = l..nO(V/c = l..nO ? Vik ; 

(3) (Vi = l..n)(V/c = = l..rii+i)Vik{<, ?}V(i+i)z ; 

U) (Vi = F:;^)(Vj = i + l..n) {{3l){3m){Va = Vjm) ^ (VA: = rj)(35)/(Cfc, = Va). 

Let us consider a set of variables. The problems which we will consider are: 

( 1 ) building the constraints network; ( 2 ) checking the consistency of the net- 
work; (3) computing all the consistent and complete scenarios or some subset 
of such scenarios; (4) computing all consistent and complete scenarios models 
or some subset of such models. 

Construction of the Constraints Network In accordance with figure 3, each 
component C receives a set of sequences of events {ei, . . . , e^} ^ from the com- 
ponents Cl , . . . , Cm 5 respectively. 

Table 3 gives the relations between the domain of definition of the events. The 
relations Aij^^i define the matrix of atomic relations between the ends of the 
intervals lij and Iki- 



" 




^ln-\ ■ ■ ■ ^Irim 


Imnm 


^11 


EQ 


■ ■ ' 


• • ^ll,mnm 


Uni 


^lni,ll • 


EQ 




^ml 




EQ 




Imnm 






EQ 



Table 1. The matrix of relations between variable domains associated with events 

Algorithm 1 The algorithm receives constraints between the variables domains 
(see the matrix in table 3) and generates a constraints network. This algorithm 
consists on three steps. 

1 . The constraints translation rules described in the section 2 is applied and 
the constraints network between variables is built: 

-^}) ^^}), Af Then, 

it applies the application specific constraints defined in the section 2 ; 

2. The network is simplified by applying the following rules: 

( 1 ) (Vi)(Vj)(VA: k > j) Rij ik = {<} (2) transform the equality relation:^ 
(Vi)(Vj)(3{=} c %) U {>, <}/{=}); 

3. Finally, all the arcs labeled by the relations ”> and 7 ^” are removed from 
the network. This is because relation the < is the symmetrical relation of the 
relation > and the relation 7 ^ does not provide any additional information. 

" {yi)Ei = {ei,...em„^} and rii = n. 

^ van Beek [15] proposed the construction of a condensed network allowing the re- 
grouping of the nodes connected by the relation This new structure makes it 

possible to reduce the complexity in the case of the search for a consistent scenario 
((0(n^)). However, it is not efficient if we wish to generate all consistent scenarios. 



470 Aomar Osmani and Francois Levy 



Example 3 This constraints network is built from the example in figure 2. 




Fig. 5. A constraints network generated by the algorithm 1. 

Remark: if the relations in the network belong to the set {<,>,?} then check- 
ing the network consistency is done with a complexity O(n^) (Knuth [5]). van 
Beek [17] showed that the topological sorting suggested by Knuth applies by 
adding the relation does not change the complexity of the checking of the 
consistency. 



Generating of the Scenarios We are interested in the generation of two kinds 
of scenarios: 

— The events at the input of the components. These events are treated in the 
order of arrival. To study all the behaviors of the components it is necessary 
to simulate all the possible orders between events. We compute, at each 
component, all possible scenarios; 

— The events at the input of the supervisor (SCS). These events are expressed 
by partially ordered sequences. These scenarios are called generic scenarios. 
One or more interpretations of the model is possible. 

Algorithm to generate a single scenario This algorithm seeks a scenario, in the 
constraints network, without backtracking. To allow deductions as precise as 
possible in the components, the algorithm recomputes the fields of occurrence of 
the events (fields of definition of the variables); 

Algorithm 2 Generation of a single scenario 

Begin 

1 A={root of the constraints network] 

2 sequence=^; 

3 Icourant — [O 5 O]; 

4 While (A ^ 0 ; do 

^ ^courant k(Ep.EA)^ ’ 

6 Takes an unspecified node Ej from A 

^ ^courant maxi^Ij idcourant)’ 

3 Ij = Icourant^ 

9 sequence =(sequence,Ej ); 

10 A= {A/ Ej)U {constraints sub-network roots } 

11 endwhile 
end 



A Constraint-Based Approach to Simulate Faults 471 



1 234 56 789 10 11 12 

Cl 

C2 
C3 



^33: 



eil 



Fig. 6. Instance of the events at the input of the component C. 



Example 4 Let us consider the example of figure 6. The constraints graph cor- 
responding to this situation is described by the figure 2. Table 2 gives an example 
illustrating algorithm 2. 



Step 


roots 


Icourant 


sequence 


1 


{esi} 


[0,0] 


0 


2 


{C32, 621, eii} 


[1,3] 


(631 ) 


3 


{C32, 621 } 


[4,6] 


( 631 , 611 ) 


4 


{ 621 , 633 } 


[4,6] 


( 631 , 611 , 632 ) 


5 


{ 622 , 633 } 


[5, 10] 


( 631 , 611 , 632 , 621 ) 


6 


{ 633 } 


[9,11] 


( 631 , 611 , 632 , 621 , 622 ) 


7 


0 


[9,11] 


( 631 , 611 , 632 , 621 , 622 , 633 ) 



Table 2. The instance of the scenario with updated variable domains generated by 
the algorithm 2 is: (e 3 i[l, 3], en[4, 6], es2[4, 6], 62 i[ 5 , 10 ], 622 [ 9 , ll],e33[9, 11]) 

Generation of all scenarios 

The generation of all the possible scenarios makes it possible for the components 
to simulate all the consistent sequences, which they can receive during the sim- 
ulation of fault situations. 



Algorithm 3 Generation of all scenarios 

initialization : sequence =();Icourant = [0,0]; 

A={roots of the constraints network} 
Procedure Scenarios-all(A, I cour ant, sequence) 

Begin 

1 If (A = 0) then write sequence; else 

^ ^ courant 

3 While {Ej e A) do 

f ^ cour ant maxi^Ij ?-^conrant)^ 

h Ij — I cour ant ; 

6 sequence =(sequence,Ej ); 

7 A — lyAj Ej)A {roots of the sub-network } 

8 Scenarios- alls (A, I cour ant, sequence) 

9 endwhile 
End 



472 Aomar Osmani and Francois Levy 



Generation of the models of scenarios Unlike the network’s components, which 
are sensitive to the order of input events, the supervisor component (SCS) simply 
indexes them. To reduce the complexity generated by the exhaustive generation 
of all the scenarios and to facilitate the analysis of the effects of the simulated 
breakdown situations, the supervisor builds a set of scenario models. The fol- 
lowing algorithm generates these models and updates all temporal domains of 
events without backtracking. 

Algorithm 4 Generation of the set of scenario models 



NB: Let us consider G' a constraint sub-network of Rc, 

We denote A the set of roots of G' . 

If the variable Eij is a root of G' then it will be denoted Aij . 

Al = {Eu/ k^i}r\ {G'/A). A = US?(A"). 

Procedure model(Rc) 

Begin 

1 A={roots of the constraints network }; 

2 sequence={); 

3 While (A 7^ 0 ) do _ 

4 Itourant = min{Ij-) ( if (A = 0 ; then itourant = rnax{l\)); 

^ Vourant TfliTlO ) j 

6 Pred = 0; 

7 For all {Ei G A) do 

8 If {if < ifourant) Then {Pred = Pred U Ef; 

9 U{Ai = 0 ) then (Pred^ = Pred VJEi); 

10 I A, = max{lr , I~urant)i 

12 sequence= (sequence, A); 

13 A = {AjPredb); 

U For aU{Ei e A) do I~ = itouranti 
15 A = AU {roots of the sub-network} ; 

End ^ 

Example 5 Let us consider the example of figure 6. The constraint network 
corresponding to this situation is described by figure 2. 



Step 


Roots 


Icourant 


sequence 


1 


{esi} 


[,] 


0 


2 


{632,611,621} 


[ 1 , 4 ] 


(e3i[l,3]) 


3 


{621 [8, 10], 633} 


[2,8] 


(631 , {632 [ 2 , 6 ], 6ii [4, 7] , 621 [ 5 , 8] }) 


4 


{633 [ 9 , 11 ], 622} 


[ 8 , 9 ] 


(esi, {632,611,621}, 
{621 [8, 10], 633 [8, 10]}) 


5 


0 


[ 9 , 12 ] 


(631, {632, 611, 621}, {621 ,633}, 
{633[9, 11 ], 622(9, 12 ]}) 



Table 3. Illustration of the algorithm 3 on the example of the figure 6. 



A Constraint-Based Approach to Simulate Faults 473 



5 Conclusion 

This paper considers a simulation-based approach to understand the behav- 
ior of a telecommunication network when faults occur, the inaccuracies of the 
propagations and the times of processing are modeled by intervals of time. The 
comportment of the components depends on the order of arrival of the events. 
It is therefore necessary to generate at the componant level all the consistent 
scenarios. 

This article proposes two kinds of algorithms. The first calculates all the possible 
scenarios from the inputs of a component. The second calculates (in linear time) 
a compact representation of all the possible scenarios (model of scenarios). The 
checking of the consistency of a scenario in the model is also done in linear time. 
The models of scenarios can be generated usefully only at the supervisor center. 
Our current search aims to build compact representations of the scenarios of the 
network components by learning, in particular, the behavioral description of the 
components. 



References 

1. R. Boubour and C. Jard. Fault detection in telecommunication networks based on 
Petri net represnetation of alarm propagation. In Proc. ICATPN-97. 463 

2. A. Bouloutas, G. Hart, and M. Schwartz. Fault identification using a FSM model 
with unreliable partially obseved data sequences. JNMC-93, pages 1074-1083. 463 

3. R. Dechter, I. Meiri, and J. Pearl. Temporal constraint networks. In R. J. Brach- 
man, H. J. Levesque, and R. Reiter, editors, KR-1991, pages 61-95. MIT Press. 
467 

4. D.W. Gurer, I. Khan, R. Ogier, and R. Keffer. An artificial intelligence approach 
to network fault management. In WS-IJCAI-95. 463 

5. D. E. Knuth. The Art of Computer Programming (Volume I): Fundamental Algo- 
rithms. Addison- Wesley, Reading, MA, 1973. 470 

6. P.B. Ladkin and R.D. Maddux. On binary constraint problems. Journal of the 
Association for Computing Machinery, 41(3):435-469, 

7. 467 

8. L. Lewis. A case-based reasoning approach to the management of faults in com- 
munications networks. In Proc. of CAIA-1993, pages 114-120. 463 

9. A. Osmani. Modeling and simulating breakdown situations in telecommunication 
networks. In (IEA/AIE-1999). 464 

10. A. Osmani. Diagnostic de pannes dans les reseaux : approche a base de modeles et 
raisonnement temporel. PhD thesis, LIPN, Univ-Parisl3, 

11. 464 

12. A. Osmani, E. Mayer, M.O. Gordier, P. Dague, and F. Levy. Modelisation de 
reseaux de telecommunications pour les besoins de gestion de pannes : cas de la 
technologic ATM. Technical report, GNET-LIPN-IRISA, 1998. 464 

13. A. Osmani and L. Roze. Supervision of telecommunication networks. In Proceedings 
of the European Control Conference (ECC-99), 1999. 463, 464 

14. J.W. Sheppard and W.R. Simpso. Improving the accuracy of diagnostics provided 
by fault dictionaries. In Proceedings of theVLSI-96 Test Symposium, 1996. 463 



474 Aomar Osmani and Francois Levy 



15. P. van Beek. Exact and approximate reasoning about qualitative temporal relations. 
Phd thesis, University of Waterloo, 1990. 467, 469 

16. M. Vilain and H. Kautz. Constraint propagation algorithms for temporal reasoning. 
In AAAI-1986, pages 377-382. 467 

17. M. Vilain, H. Kautz, and P. Van Beek. Constraint propagation algorithms for tem- 
poral reasoning: A revised report. In D. S. Weld and J. de Kleer, editors. Readings 
in Qualitative Reasoning about Physical Systems, pages 373-381. Kaufmann, 1990. 
470 

18. H. Wietgrefe, K. Tuchs, K. Jobmann, G. Carls, P. Frohlich, W. Nejdl, and S. Ste- 
infeld. Using neural networks for alarm correlation in cellular phone networks. In 
Proceedings of the International Workshop on Applications of Neural Networks in 
Telecommunications, 1997. 463 

19. T. Yamahira, Y. Kiriha, and S. Sakata. Unified fault management scheme for 
network troubleshooting expert system. In B. Meandzija and J. Westcott, editors. 
Integrated Network Management (I). Elsevier Science, 1989. 463 

20. S. Yemini, S. Kliger, E. Mozes, Y. Yemini, and D. Ohsie. High speed and robust 
event correlation. In IEEE Communications Magazine. 1996. 463 



A Least Common Subsumer Operation for an 
Expressive Description Logic 



Thomas M ant ay 



Labor fiir Kiinstliche Intelligenz, Universitat Hamburg 
Vogt-Kolln-StraBe 30, D-22527 Hamburg 
mantayOinf ormatik . uni-hamburg . de 



Abstract. Computing least common subsumers in description logics is 
an important reasoning service useful for a number of applications. As 
shown in the literature, this reasoning service can be used for the approx- 
imation of concept disjunctions in description logics, for the “bottom-up” 
construction of knowledge bases, for learning tasks, and for specific kinds 
of information retrieval. So far, computing the least common subsumer 
has been restricted to description logics with rather limited expressivity. 
In this article, we continue recent research on extending this operation 
to more complex languages and present a least common subsumer opera- 
tion for the expressive description logic ACQ featuring qualified number 
restrictions. 



1 Introduction 

Terminological knowledge representation systems (TKRS) based on description 
logics (DLs) have proven to be a useful means for representing the termino- 
logical knowledge of an application domain in a structured and formally well 
understood way [5]. In DLs, knowledge bases are formed out of concepts repre- 
senting sets of individuals. Complex concepts are built out of atomic components 
and roles (representing binary relations between individuals) using the construc- 
tors provided by the DL language. For example, the set of grandmothers can be 
described using the atomic concepts woman and parent and the role has-child: 
woman n (> 1 has-child parent). 

A central feature of TKRSs based on DLs is a set of reasoning services with 
the ability to deduce implicit knowledge from explicitly represented knowledge. 
For instance, the subsumption relation between two concepts can be determined. 
Intuitively, a concept C subsumes a concept D if C is more general than D. 
The least common subsumer (LCS) operation, applied to concepts Ci, . . . ,C^, 
computes the most specific concept (from the infinite set of all concepts) which 
subsumes Ci, . . . ,0^. The LCS is an important reasoning service useful for a 
number of applications. Cohen et al. consider an LCS operation for learning 
tasks [4] and in order to approximate the disjunction constructor in the DL 
underlying the TKRS Classic. Baader et al. use the LCS for the “bottom- up” 
construction of KBs based on the DLs ACAf [1] and ACE [2]. In [8], Moller et 
al. apply the operation to commonality-based information retrieval where the 
LCS formalizes the notion of “commonalities” of concepts. As recent literature 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 474—483, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



A Least Common Subsumer Operation for an Expressive Description Logic 475 



shows, there is a tendency to extend this reasoning service to more and more 
expressive DL languages. 

The main contribution of this paper is the proposal of an LCS operation 
for the DL ACQ consisting of the top and bottom concept, atomic concepts, 
negations of atomic concepts, concept conjunctions, qualified >-restrictions, and 
qualified <-restrictions. This is the first time that qualified number restrictions 
are considered in research on the LCS. Moreover, the constructors of ACQ have 
proven to be useful in our commonality-based information retrieval applications. 
Due to space limitations, technical details of the presented work including com- 
plete proofs are not given here but in [7]. 

2 Preliminaries 

We start our analysis by reviewing the definition and some properties of the DL 
ACQ (e.g., considered in [6]). Let C be a set of atomic concepts and IZ a set of 
roles disjoint from C. {ACQ-) concepts are recursively defined as follows. The 
symbols T (top) and T (bottom) are concepts. All A G C (atomic concept) and 

G C (negated atomic concept) are concepts. If C and D are concepts, R eTlis 
a role, and n G IN U {0}, then CnD (concept conjunction), (> nRC) (qualified 
>-restriction), and (< nRC) (qualified <-restriction) are also concepts. A 
subexpression of a concept C is a substring of C that qualifies as a concept. 

The semantics of a concept is defined in terms of an interpretation. An in- 
terpretation X = (Z\^, A) of a concept consists of a non-empty set (the 
domain of X) and an interpretation function A ^ The interpretation function 
maps every atomic concept A to a subset A^ C and every role to a 
subset R-^ C AA X AA . The interpretation function is recursively extended to 
a complex ACQ concept as follows. Assume that AA^C^^D^^ and R^ are al- 
ready given and n G IN U {0}. Then := Z\^, := 0, (^A)^ := A^ \ A^, 

{C n D)^ := n {> nR C)^ := {a G A^\^{aR^ n C^} > n}, and 
(< nRC)^ := {a G A^\\\{aR^ nC^} < n}, where aR^ := {h G A^|(a, 5) G R^}. 
An interpretation X is a model of a concept C iff / 0. C is called satisfiable 
iff C has a model. C is subsumed by D [C X D) iff C holds for all 
interpretations X of C and D. C is equivalent to D {C = D) iff both C X D and 
D X C hold. Note that \f R.C is expressible in ACQ by (< 0 R D) it D ^ ACQ 
exists with D = (which is the case if no concept conjunction occurs in any 
subexpression of C). The depth of a concept C is recursively defined over its 
structure. If C = (> nRC) or C = (< nRC), then depth{C) := \^depth{C) 
and if C = Cl n • • • n C^, then depth{C) := max({dept/z(C^)|l < i < n}). In all 
other cases, we set depth{C) := 0. A depth n subexpression of a concept C is a 
subexpression of C that occurs on depth n of C. The LCS of concepts Ci, . . . , C^ 
is defined as the set of most specific concepts which subsume Ci, . . . , C^. 
Definition 1. Let Ci,... ,C^, n > 1, be concepts. Then we define the set 
of least common subsumers of ... cls lcs{Ci, . . . ,Cjf) := {T^|Ci X 
E L--- ^CnXE L\/E' -.CiXE' ^CnXE' ^ EX E'}. 

From this definition it follows immediately that, for concepts Ci,... ,C^, all 
pairs of elements of lcs{Ci, . . . , Cn) are equivalent. Due to this uniqueness prop- 
erty, we will consider lcs{Ci, . . . , Cn) as a concept rather than a set of concepts 



476 Thomas Mantay 



in the following. It can be shown that Ics is associative and commutative and 
lcs{Ci, . . . , Cn) = lcs{Ci, lcs{C 2 , . . . lcs{Cn-i, C^) •••))' Therefore, we will re- 
strict the presentation of the LCS algorithm to the binary LCS operation. In the 
next section, we will define a normal form for concepts with which the subsump- 
tion problem for ACQ can be decided by a structural subsumption algorithm. 

3 A Normal Form 

Cohen, Borgida, and Hirsh [3] showed that the LCS of two concepts can be 
determined by a simple algorithm if the concepts are in structural subsumption 
normal form (SSNF), i.e. a normal form such that the subsumption between the 
concepts can be decided by structural comparisons. In this section, we will define 
the SSNF of ACQ concepts. In a first step, we order the concept components 
w.r.t. the concept forming constructors and perform some simplifying semantics- 
preserving operations on the concept. 

Definition 2. A concept C is in sorted normal form (SNF) iffC = T or C = C 
or C = AULnM with L = rii<^<^(> pi Ri Ci) and M = ni<^<^(< qi R- C'-), 
where A is an arbitrary (possibly empty) conjunction of atomic and negated 
atomic concepts, Ci and C- are also in SNF, and m,n > 0. Furthermore, we 
assume all nested conjunctions to be flattened, i.e. An {B n C) ^ A n B n C 
and all subexpressions of C which are equivalent to either T or F are replaced 
by T and F, respectively. 

We will now give an algorithm with which subsumption between concepts can be 
computed by structural comparisons. Note that Algorithm 1 does not necessarily 



Algorithm 1 structural-subsumption(C, D) 

if C = T or D = F or C and D are atomic or negated atomic concepts with C = D 
then true 

else if C is of the form (> n R C'), D is of the form (> m R D'), n < m then 
structural- sub sump tion{C ' , D') 

else if C is of the form {< n R C'), D is of the form {< m R D'), m < n then 
structural- sub sump tion{D ' , C') 

else if C is of the form Ci n • • • n Cn and D is of the form Di n • • • n Dm then 
Vz G {1, . . . ,n} : G {1, . . . ,m} : structural-subsumption(Ci, D j) 

else false 



return the correct result for arbitrary ACQ concepts. In the sequel, we will define 
the structural subsumption normal form (SSNF) for concepts C and D such 
that it returns true when invoked with the SSNFs of C and D D \F C. For 
a satisfiable concept C in which a role R occurs as a substring in a depth 0 
subexpression, we say that C has n R-successors iff, for all models X of (7, there 
are individuals i, ji , . . . ,jn ^ CA such that (i, ji), . . . , {i^jn) ^ • 

When determining the subsumption relationship between concepts by struc- 
tural comparisons, possible “interactions” between concept forming constructors 



A Least Common Subsumer Operation for an Expressive Description Logic 477 



must be considered. We point these interactions out by analyzing some exam- 
ples. Let Xi := (> 2 RT) and Yi := (> lRA)r\{> IR^A). Then, the invocation 
structural- subsumption{Xi,Yi) returns false^ even though Yi □ Xi holds. The 
reason for Yi □ Xi is that Yi implies the qualified >-restriction (> 2 RT). 
This concept is not explicitly present as a subexpression in Yi , but its existence 
would be relevant for the structural subsumption algorithm to work correctly. 
The following example shows that interactions between qualified >-restrictions 
can be influenced by qualified <-restrictions. Let X2 := {> 1 R {An B H C)) 
and Y2 := {> 1 R A) n {> 1 R B) n {> 1 R C) n {< 1 R T). Again, the 
invocation structural- sub sumption{X2^Y2) returns false^ even though Y2 ^ X2 
holds. The reason for I2 E X2 is that Y2 implies the qualified >-restriction 
(> 1 R{AnBnC)). If this concept was explicitly represented in I2, Algorithm 1 
would return the correct result. Obviously, (> 1 R {A n B n C)) is the most 
specific qualified >-restrictions following from Y2 which is not explicitly repre- 
sented in Y2. Finally, let X3 := (< 0 ~^A) and Y3 := (> 1 A) □ (< 1 R T). 

The relation I3 □ X^ holds, but the invocation structural- subsumption{Xs,Ys) 
returns false. However, if the additional qualified <-restriction (< 0 ^A) was 

explicitly represented as a conjunct in L3, Algorithm 1 would return true as 
desired. 

As a consequence, in order to transform a concept C into a normal form C 
such that subsumption can be decided by structural comparisons, the idea is 
to compute the set of most specific qualified >- and <-restrictions following 
from C and conjunctively add the elements of this set to C as additional concept 
components. We recursively repeat this process on the quantifiers occurring in C 
until we end up with a concept which is in SSNF by definition. Obviously, atomic 
concept components do not influence the process of computing the SSNF of a 
concept. Since C contains no “new” information w.r.t. ( 7 , semantic equivalence 
between C and C is guaranteed. In our examples, if we define X\ := Xi, Yi := 
Yin(> 2i^T),X2 := X2,Y2 := Y2H(> 1 (A □ H □ C)), X3 := X3, Y3 := 
Y3 n (< Oi^^A), the invocations structural- sub sumption{Xi,Yi), i G { 1 , 2 , 3 }, 
return true as desired. We will now formalize these ideas and define the SSNF 
of a concept recursively on its structure. 

Definition 3. Let C be a concept in SNF given by the expression in Definition 2, 
:={(> kRD)\Cn{> kRD) = C A\/{> I R D') : C n {> I R D') = C ^ 
(> kRD)n{> IRD')}, and Me '= {(< kRD)\Cn{< kRD) = C A\/{< 
IRD') : Cn{< IRD') = C ^ {< k R D) \Z {< IR D')}. Then we define 
the structural subsumption normal form (SSNF) (w.r.t. C) as a concept C as 
follows: 

(i) If depthiC) = 0, then C^ := C , and . .... 

(a) if depthyC) > 0 , then C \= An Ln M n nD^Cc^Mc^ ^ 

SSNF w.r.t. L^M, andD, respectively. 

Intuitively, for a concept C and a fixed concept depth, the set Cq (Me) con- 
tains the most specific qualified >-restrictions (<-restrictions) following from the 
depth 0 components of C which are relevant for structural subsumption com- 
putation, but possibly not explicitly represented as conjuncts on depth 0 of C . 
These sets are used for the definition of a new concept C in which the additional 
qualified number quantifications are made explicit. Since Cc and Me are finite 
in case C / T, C is well defined. In [ 7 ], we prove the following theorem. 



478 Thomas Mantay 



Theorem 1. Let C and D be concepts and C (D) he in SSNF w.r.t. C (D). 
Then structural-subsumption (^C, returns true iff D T C . 

When determining the subsumption relationship between two concepts C and D, 
we first compute concepts C and D in SSNF w.r.t. C and D, respectively, and 
invoke Algorithm 1 on (7 and D. Theorem 1 guarantees that the subsumption 
relationship D T C holds iff structural- sub sumptioniC ^ D) returns true. 

4 Computing the Normal Form 

The preceding section showed that in order to compute a concept C in SSNF 
w.r.t. a concept C, we must determine the set of most specific qualified number 
restrictions following from the subexpressions on each depth of C. The diffi- 
culty is to capture exactly those qualified number restrictions which are induced 
by all models of C. Therefore, the algorithm is closely related to the strategy 
of a tableau prover for ACQ [6]. Since tableau provers aim at creating a con- 
straint system representing only one possible model of C, our algorithm will 
create a finite set of constraint systems representing the set of all partial^ mod- 
els of C. Each constraint system of this set induces a conjunction of qualified 
>-restrictions and, by considering their commonalities (formalized by an LCS 
operation on subexpressions of depth smaller than C’s depth), the set Cc in 
Definition 3 can be obtained. From this set and the qualified >-restrictions oc- 
curring on the top level of C we will then determine the set A4c- In order to 
keep the presentation of the algorithm simple, we will only consider satisfiable 
concepts C with C ^ T. This can be done without loss of generality since sub- 
sumption between concepts C and D can easily be determined by Algorithm 1 
if C or D (or both) are equivalent to T or T. 

We first introduce some helpful notation. Throughout the rest of this section, 
for a concept C in SNF given by the expression in Definition 2, let 

:= {(> Pi Ri Cl), PnRn Cn)} and 

Me := {(< qi Ri c[), qmR'm 

A constraint is a syntactic object of the form {R, D) where is a role and D 
is a concept. Intuitively, {R, D) represents an i^-successor with property D. A 
constraint system is a finite, non-empty multiset of constraints. 

Definition 4. Let C be a concept in SNF given by the expression in Definition 2. 
Then we define the function cs(C) := {(R, D)^, . . . , (R, D)’^|(> n R D) e 
The function cs(C) returns the constraint system induced by the qualified >- 
restrictions on the top level of C. Our intention is to successively modify cs(C) 
into a set of new constraint systems M such that the set Cc can easily be derived 
from M . Therefore, it is convenient to exhaustively “merge” the constraints in 
cs(C). For a concept C and a constraint system CS, we say that CS is compatible 
to Zc iff, for all (> n R D) ^ Zc, there exist at least n constraints of the form 
{R, D') in CS such that D' □ D. CS is compatible to {< n R D) e Me iff there 
exist at most n constraints of the form {R, D') in CS such that D' □ D. CS is 
compatible to Me iff OS' is compatible to all (< n R D) e Me- 
^ We use the expression “partial models” since the constraint systems do not represent 
constraints imposed by atomic concept components. 



A Least Common Subsumer Operation for an Expressive Description Logic 479 



Definitions. Let CS = Ci), . . . , (7?^, he a constraint system 

and ci'^d Wlc be given for a concept C. Then we say that CS' emerges 
from CS hy application of the T-merging rule (CS — CS' ) iff there exist 
k^k' G , n}, k 7^ k' , such that CS' = {{Ri^ Ci)\i G , '^} \ {k, k'}} U 

{{Rk^Ck n Ck')}, CS is not compatible to "iLflc , CS' is compatible to Rk = 
Rk' , and Ck^~^Ck' is satisfiable. CS' emerges from CS hy successive applications 
of the T-merging rule iff there exist CSi ^ . . . , CSr, r G MU { 0 }; such that CS = 
CSi — • • • — CSr = CS' . CS' is T-merging rule complete w.r.t. CS iff 
CS' emerges from CS hy successive applications of the T -merging rule and there 
exists no CS" such that CS' — CS" . In this case, CS' is called a T-merging 
rule completion of CS. Furthermore, we define the set of T-merging rule com- 
pletions of CS as := {CS'\CS' is a T -merging rule completion of CSf. 
Intuitively, by applying the T-merging rule to a constraint system, we yield a 
new constraint system in which two constraints are replaced by a new constraint 
whose concept argument consists of the conjunction of the concept arguments of 
the two eliminated constraints. In Definition 5 , the reason (and condition at the 
same time) for applying the T-merging rule to CS is that CS incompatible 
to Wlc- Also we demand the result of the rule application to be compatible 
to £(7. For the new constraint to be able to represent the two merged constraints, 
the two merged constraints must involve the same role and the conjunction of 
their concept components must be satisfiable. A constraint system CS is T- 
merging rule complete if no further T -merging rule application to CS is possible. 
Let us apply the T-merging rule to the constraint system cs(Y2) obtained from 
our example concept Y2. We have cs(Y2) = {{R, A), {R, B), (R,C)} and, after 
successive applications of — we get = {{{R, A\~\ B \~\ C)}, {{R, A □ 

cnB)},{{R,BnAnC)},{{R,BncnA)},{(R,cnAnB)},{{R,cnBnA)}}. 

Now, from derive that Y2 has at least one i?-successor which is an 

instance of A\lBr\C. However, by exhaustive application of the T -merging rule 
to csiYf) with L4 := (> 1 A) □ (> 1 RB)r\{> 1 RC)n{< 2 RT), we yield a 
set of T-merging rule completions from which we would derive that I4 

has at least two i?-successors, even though we can easily find a model of I4 in 
which only one i^-successor is present. The idea to circumvent this problem is 
to introduce a T-merging rule in which the conditions for the rule application 
is modified. Later we will consider the results of exhaustive applications of both 
merging rules when determining the additional qualified >-restrictions. 
Definition 6. Let CS = Ci), . . . ,{Rn,Cn)} be a constraint system and 

£(7 dJtc be given for a concept C . Then we say that CS' emerges from 
CS by application of the T-merging rule (CS — CS') iff there exist k,k' G 
{!,... ,n},k k' , such that CS' = {{Ri, Ci)\i G , n}\{k, k'}}U{{Rk, Ck^l 

Ck')}, CS' is compatible to He, for all (< nRD) G OJlc; if CS is compatible to 
(< nRD), then CS' is also compatible to (< nRD), Rk = Rk' , ond Ck ^Cy 
is satisfiable. CS' emerges from CS by successive applications of the T-merging 
rule iff there exist CSi,... ,CSr, r G WU { 0 }, such that CS = CSi — 

• • • — CSr = CS' . CS' is T-merging rule complete w.r.t. CS iff C S' emerges 
from CS by successive applications of the T-merging rule and there exists no 
CS" such that CS' — CS" . In this case, CS' is called a T-merging rule 
completion of CS. Furthermore, we define the set of T-merging rule completions 
of CS as := {CS'\CS' is a T-merging rule completion of CSf. 



480 Thomas Mantay 



Unlike in Definition 5, the reason for applying the T-merging rule is no longer 
given by the elements of fOtc- Instead we want to merge as many constraints 
as possible, while taking care that the constraint system remains compatible to 
both and each element of dJlc- For our example concept I 4 , we get = 

Mj^^y^y Considering the “commonalities” of the constraint systems in 
and M^^^y^y we can derive that Y 4 has at least one i^-successor (with prop- 
erty T), which is a correct conclusion. 

We will now formalize the idea of computing Cc and Me (see Defini- 
tion 3) from and for a concept C. We introduce the functions 

at-leasts{C S) which extracts the qualified >-restrictions induced by CS. 
Definition 7 . Let CS be a constraint system and Lcs •= {(> n R D)\ 
{(R,Di), . . . ,{R,Dn)} U CS and D = /cs(Di, . . . , D^)}. Then we define 
at-leasts{C S) '= r d)^Lcs{'^ nRD). 

For each subset P = {(i^, Di), . . . , (i^, D^)} of Lcs contains a qualified >- 
restriction of the form (> nRD)^ where n corresponds to the number of elements 
in P and D corresponds to the LCS of the concepts Di, . . . , The elements 
of Lcs represent all qualified >-restrictions induced by CS. As an example, let 
CS = {(i?, A n B), (i?, C)}. Then we get at-leasts{C S) = (> 1 R{AU B)) □ (> 
li^C) n (> 2RT). 

Given a concept (7, our intention is to apply the function at-leasts to the ele- 
ments of and yielding a conjunction of the most specific qualified 

>-restrictions following from the depth 0 components of C. 

Definition 8 . Let C he a concept. Then we define 
>-completion{C) '= at-leasts{CS)McSGM^ at-leasts{C S)) . 

^ ^cs(C) ^ ^cs(C) 

The concept >-completion((7) is a conjunction of qualified >-restrictions which 
represents the commonalities of the sets of qualified >-restrictions induced 
by and ^ct(C)- In an algorithm implementing Definition 8 , the normal- 

ization procedure of the qualified >-restrictions which are the parameters for 
the LCS in Definition 8 can be restricted to their concept arguments. Therefore, 
normalization is done for concepts of depth smaller than depth{C). This shows 
that there is an inductive dependency between the SSNF of a concept and the 
LCS: The SSNF of a concept of depth 0 does not involve LCS computations 
since such a concept is already in SSNF by definition. The SSNF of a concept of 
depth n falls back on an LCS computation of concepts which requires subcon- 
cepts of depth n — 1 to be in SSNF. In order to compute >-completion(l 2 ), we 
must determine the LCS of the two conjunctions at-leasts{C S) and 

cs(Y2) 

^csgm^ at-leasts{C S)) . As can be verified by the algorithm given in Sec- 

cs(Y2) 

tion 5, the result of this LCS computation is given by the concept (> \ R{AU 
B nC)). Now we will show how to compute Me- 

Definition 9. Let C be a concept, >-completion[C) = Li □ • • • □ Lp,Lc' := 
{Li,... , 

I/p} U Zc , ciTid Me := {(< {m — n) RD')\3(f> nRD) G Le, (< mRE) ^ dKc : 
m>nTD\TE/\D'^ ACQ AV' = -^V}. Then we define: <-completion{C) := 
^ ^ ^{<nRD)eM^(— where is the largest subset 0 / Me in which 

no pair of equivalent concepts occurs. 



A Least Common Subsumer Operation for an Expressive Description Logic 481 



The definition of <-completion(C) reflects the idea that a subexpression of C H 
>-completion(C) of the form (> nRD)r\{< mRE) implies (< {m — n)RD')\f 
m > n, D E and D' G ACQ with D' = ^D. If these conditions are fulfilled, 
Me is the set of most specific qualified <-restrictions following from the depth 0 
components of C. We conjoin T to <-completion{C) in case = 0. With 
completion{C) := >-completion{C) □ <-completion{C) ^ we can now give an 
algorithm for computing a concept C in SSNF w.r.t. C and the final result of 
this section. 



Algorithm 2 c-ssnf (C) 



if C = T or C = T or C is a conjunction of atomic and negated atomic concepts 

then C 

else if C = (> nRC') then (> n R c-ssnf {C')) 

else if (7 = (< nRC') then (< n c-ssnf (C")) 

else if C is of the form AVA LU M then 

// A, L, and M are defined as in Definition 2 

/ / >-completion((7) = (> Di) n • • • n (> kr Sr Dr) 

/ / <-completion((7) = (< h T\ E\) □•••□(< IsTs Es) 

A n (> Pi Ri C-SSnf(Cl)) □•••□(> Pn Rn C-SSnf(Cn)) 
n (> qi R'i c-ssnf((7i)) □•••□(> qm Rm c-ssnf(C^)) 
n (> ki Si c-ssnf(Di)) □•••□(> kr Sr c-ssnf(Z>r)) 
n (< /i Ti El) n • • • n (< h Ts Es) 
endif 



Theorem 2. Let C be a concept in SNF. Then c-ssnf{C) returns a concept C 
which is in SSNF w.r.t. C . 

Theorem 2 ensures that we obtain a concept C which is in SSNF w.r.t. C if 
we recursively add the conjunction of qualified number restrictions given by 
completion{C) to the depth 0 components of C and recursively repeat this pro- 
cess for all quantifiers occurring in the qualified number restrictions (including 
the new qualified >-restrictions resulting from the >-completion computation) 
until we end up with a concept which is in SSNF by definition. Modulo commu- 
tativity of conjunction, for I2 and I3, we get T2 = ^2 H (> 1 R {AVA B VA C)), 
and T 3 = Ys n (< OR ^A) as desired. We will now use the results of this section 
for computing the LCS of two concepts. 

5 The LCS Algorithm 

Given concepts C and D and their corresponding concepts in SSNF, C and l), 
IcsiC^ D) can straightforwardly be implemented into an algorithm taking C 
and D as arguments. Algorithm 3 recursively computes lcs{C, D) with argu- 
ments C and D. Let X5 := AnYs and I5 := An (> 2 RB)n{< lR{^AnC)). 
Then, both X 5 and T 5 are in SSNF and compute-lcs{X^,Y^) returns a concept 
equivalent to A □ (> 1 RT) \1 {< 1 R (^A □ C)) as desired. 



482 Thomas Mantay 



Algorithm 3 compute-lcs (C, D) 



\^ D \ZC then C else if (7 C D then D 

else if (C = A V C = ~^A) and (D = B V D = ~^B) for atomic concepts A and B 
then ii C = D then C else T 
else if (7 = (> n R C) and D = {> m R D') then 
(> min({n, m}) R compute-lcs ((7^ D')) 
else if (7 = (< nR C') and D = {< m R D') then 
(< R (C' n D')) 

else if (7 = (7i n • • • n (7n then ni<i<nCompute-lcs((7i, D) 
else if D = n • • • n Dn then compute-lcs (T>, (7) 
else T endif 



Theorem 3. Let C and D be concepts and C (D) be in SSNF w.r.t. C{D). 
Then compute-lcs ((7, D) returns a concept which is equivalent to lcs{C^ D). Fur- 
thermore, the size of lcs{C, D) is polynomial in the sizes of C and D and 
lcs{Ci , . . . , Cn) may grow exponential in the sizes of C\, .. . ,Cn whose sizes 
are polynomial in n. 

6 Conclusion 

We have presented an LCS operation for the expressive DL ACQ. Computing 
the LCS for concepts is a crucial inference service applicable to a number of 
applications. This article contributes to recent research on extending the LCS 
to more and more expressive DLs. As shown in the literature, the LCS can be 
computed by a simple algorithm if, for each language constructor, a unique least 
upper bound operation on the arguments of this constructor can be provided and 
the concepts are first transformed into a normal form with which subsumption 
between the concepts can be decided by a structural subsumption algorithm. 
The special challenge one faces is that interactions between different concept 
constructors imply relevant implicit information that must be made explicit. 
We presented an algorithm which accomplishes this difficult task. Moreover, we 
provided the least upper bound operations on the arguments of the construc- 
tors of ACQ and stated complexity results. Future research should include the 
extension of the LCS operation to more complex DLs. 



References 

1. F. Baader and R. Kiisters. Computing the Least Common Subsumer and the Most 
Specific Concept in the Presence of Cyclic ACJV -Concept Descriptions. In Proc. of 
the 22nd German Conference on AI, pages 129-140, 1998. 474 

2. F. Baader, R. Kiisters, and R. Molitor. Computing Least Common Subsumer in 
Description Logics with Existential Restrictions. In Proc. of the IJCAI, pages 96- 
101, 1999. 474 

3. W. W. Cohen, A. Borgida, and H. Hirsh. Computing Least Common Subsumers 
in Description Logics. In Proc. of the Int. Conf. on Fifth Generation Computer 
Systems, pages 1036-1043, Japan, 1992. Ass. for Computing Machinery. 476 



A Least Common Subsumer Operation for an Expressive Description Logic 483 



4. W.W. Cohen and H. Hirsh. The Learnability of Description Logics with Equality 
Constraints. Machine Learning^ 17:169-199, 1994. 474 

5. E. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Principles of Knowledge 
Representation, chapter Reasoning in Description Logics, pages 191-236. 1996. 474 

6. B. Hollander and E. Baader. Qualifying Number Restrictions in Concept Languages. 
In Proc. of the KR, pages 335-346, San Mateo, CA, USA, April 1991. 475, 478 

7. T. Mantay. Computing Least Common Subsumers in Expressive Description Logics. 
Technical Report EBI-HH-M-286/99, Department of Computer Science, University 
of Hamburg, 1999. 475, 477 

8. R. Moller, V. Haarslev, and B. Neumann. Semantics-Based Information Retrieval. 
In Int. Conf on Information Technology and Knowledge Systems, 1998. 474 



Blob Analysis Using Watershed Transformation 



Yi Cui^ and Nan Zhou^ 

^ P. O. Box 108, Beijing University of Posts and Teleeommunieations 
Beijing, 100088, P. R. China 
Yicui@public . bta . net . cn 
^ Department of Meehanieal Engineering 
Texas Teeh University, TX 79410 
Nzhou@ttu . edu 



Abstract. This paper presents an novel method for overlapping or touehing 
blob objeets (partieles) segmentation. It is based on the watershed 
transformation, one of the most powerful image analysis tools provided by 
mathematieal morphology. In this method, we first build the distanee funetion 
of the blob image, and then extraet the regional minima as markers, finally the 
watershed transformation is performed. The applieations of this algorithm is 
illustrated using examples of red blood eell segmentation and broken medieine 
pills deteetion. 



1. Introduction 

Accurate visual separation of touching blob object is a key problem in many 
applications. A. Martelli [1] and J. M. Lester, et al. [2] presented a boundary finding 
algorithm based on heuristic graph searching techniques for touching blood cell 
segmentation, their method only uses the local optical density information of the 
image. It must determine a starting point and an ending point on the boundary, and 
can not yield the closed boundary of the touching blob object. Another method which 
uses the boundary curvature function to find the points from where to draw cutting 
lines to segment touching objects is presented in[9]. Here we discuss a watershed 
transformation [3,4] method in touching objects segmentation. Watershed 
transformation is an efficient image segmentation method based on mathematical 
morphology[ll], it can both yield complete tessellations of the segmented image or 
closed contours of the segmented objects. Watershed method for overlapping object 
segmentation begins with first marking the objects to be segmented, then refined by 
outlining the objects’ contours, so it is also called marker-driven watershed 
segmentation [5]. 

In this paper, we first review some basic definitions concerning watershed 
transformation, and then we discuss the implementation of the method, finally some 
experimental results of the watershed method are presented. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 482-491, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




Blob Analysis Using Watershed Transformation 483 



2. Definitions 



2.1 Distance Function 



The distance function Dist{p) associated with a set X is given by 
\/p G X,Dist{p) — min{^ g N\p^ XQnB} For each pixel \/p g X, Dist{p) 
is the distance between p and the background 
Dist^ {p) = dist{p, X "" ) 

The distance function is at the basis of the watersheds algorithm. From distance 
function we can extract ultimate erosions, which can serve as markers for touching 
blob image segmentation. We can view the distance function as a gray-scale image or 
a topographic surface. The distance function of the red blood cell image is shown in 
Fig.l(c). 



2.2 Regional Maximum 

A regional maximum M of a gray-scale image / is a connected component of 
pixels with a given value h such that every pixel in the neighborhood of Mhas a value 
strictly lower than h. 

By definition, the regional maximum at latitude h of the distance function is the 
connected components at altitude h of Dist such that every neighboring pixel is of 
altitude strictly smaller than h. 



2.3 Ultimate Erosion 

Let B a unit disk, X is a binary image, we can successively erode X by B, as the 
erosion sequence proceeds, disconnected components are produced and at various 
stage of iteration, various components disappear. A final component is a component 
at the step immediately before it disappears. The union of the final components is 
called the ultimate erosion. 

The Ultimate erosion of a set X is equal to the union of the regional maximum of 
the distance function of X. 

The level lines and ultimate erosion of the distance function of the red blood cell 
is shown in Fig. 1(d). 




484 Yi Cui and Nan Zhou 




Fig. 1. Definition of watersheds method (a) 

Original image(up-lefl); (b) Binary image (up-right) (e);Distanee funetion of (b)(mid-left); (d) 
Level lines and ultimate erosion(mid-right);(e) Watersheds of the distanee fimetion(bottom- 
left);(f) Final segment result(bottom-right) 



2.4 Catchment Basin 

The catchment basin C{m) associated with a regional minimum /w of a gray- 
scale image regarded as a topographic surface is the locus of the points p such that a 
drop falling at p slides along the surface until it reaches m. 




Blob Analysis Using Watershed Transformation 



485 



Catehment 

basins 




Minimum 



Fig. 2. Regional minimum, catchment basins, and watershed lines. 



2.5 Watersheds 

The watersheds of a gray- scale image / are the lines that separate the various 
catchmemt basins of I. 

Watersheds can be obtained by piercing a hole at each minimum of image / and 
immersing the image into a lake. Starting from the minimum of lowest altitude, the 
water will progressively fill up the different catchment basins of I. At each pixel, 
where the water coming from two or more minimum would merge, we build a dam. 
At the end of the immersion procedure, each minimum is completely surrounded by 
dams, which delimit its associated cathment basins. The dams are the watersheds of I. 

In Fig. 1(e), the black lines which dividing the catchment basins are watersheds. 
The final result of segmentation of the touching red blood cell is shown in Fig. 1(f). 



3. Implentation 



3.1 Distance Function 

The first step to perform watersheds segmentations is to build a distance 
function from the original image. Distance function can be built from a binary image 
using distance transformation [7,8]. A distance transformation (DT) is an operation 
that converts a binary image, consisting of feature (1) and non-feature (0) pixels, to a 
gray-scale image where each element has a value that approximates the distance to the 
nearest feature element. The computation of the DT is either parallel or sequential. 

The parallel algorithm can be expressed by the following formula: 




486 Yi Cui and Nan Zhou 



Where is the value of the pixel in position (/, j) in the image at iteration m, 
{k,l) is position in the mask, and c(^,/) is the local distance from the mask[7,8]. 






min 

(A:,/)e mask 



(V 



m—\ 

i+kj + k 



+ c(kj)) 



The sequential algorithm is given below: 

Input B, the original binary image 
Output Z>, the distance function 



Scan the image in raster order: 
fory = (size + l)/2;j < H;j++) { 

for i = (size + l)/2; / < W; /++) { 



} 



Scan the image in anti-raster order: 
for j = H- (size + l)/2; 7 > 1 ; J-) { 
for i=W - (size + l)/2; / > 1; /-) { 



duj = 



min (6,., 

(k,l) backwardmask 



} 



} 



Where H, W is width and height of the image, size is size of the mask. 
b- j is the value of the pixel in position(/j) in the input image B, d- j is the 
value of the pixel in position(/j) in the output image D. 



3.2 Regional Minimum Extraction 

In watershed transformation, the algorithm begins from the pre-specified 
markers. These markers stand for the minimum of the image, and must be detected 
before performing the algorithm. To detect the minimum of the image, the most 
straight forward one consists in using the morphological reconstruction. It can be 
shown that [5] 

K(I) = I-R,(I-l) 

Where (I - 1 ) Stands for reconstruction of / by /- 1 , and 

/^(/) = 1 iff peM(I) 

K(I) = 0 if not 

is the indicator function of the minimum, M ( / ) is the regional minimum of 
image I, p is the pixel of I. 




Blob Analysis Using Watershed Transformation 487 



We can also use breadth-first search algorithm to detect the regional 
minimum[6] : 

#define INITMARKER -1 
#define NONRGNMIN -2 
input: /, the original image (distance function) 

output: /, the marker image (regional minimum labeled with INITMARKER) 

For each pixel pin J { 

J(p) ^ INITMARKER; 

} 

for each pixel pinl and J { 

if (J(p) == INITMARKER) { 

if there exists pi € Ng(p) sueh that 

il(pl)<l(p){ 

J(p) ^ NONRGNMIN; 

Add(Q, p); 

While (Empty(Q) = FALSE) { 

PI ^ Get(Q); 

For each p2 e Ng(pl) { 

If (J(p2) = INITMARKER 
and I(p2) = I(p)) { 

J(p2) NONGRNMIN; 

Add(Q,p2); 

} 

} 

} 

} 

} 

} 

} 



In the algorithm, we use a queue Q to control the searching process. Three queue 
operations are used in the algorithm: 

Add(Q ,p): add s an address of a pixel p to the queue Q. 

Get(Q): returns the address of the pixel from the queue. 

Empty(Q): return TRUE if the queue is empty, FALSE otherwise. 

The algorithm produce a marker image J, the regional minimum region is 
labeled with INITMARKER. 



3.3 Ordered Queue 

An ordered queue (OQ) eonsists of N separate queues[3,5], where N typieally 
stands for the number of grayscale of an image. Each pixel whose address is put into 
the OQ is associated with a priority. The pixel priority equals to the gray-scale of the 




488 Yi Cui and Nan Zhou 



pixel in the image. Clients arrive and will be served aeeording to their order of 
priority. Eaeh elient is put at the end of the queue eorresponding to its level of 
priority, it will be served after all elients with the same priority who arrived before it. 
Only one elient may be served at a time. Onee the queue of a given priority is empty, 
it is suppressed. If a elient with high priority arrives after the suppression of the queue 
to whieh it belongs, it will be put in the queue of highest priority still existing. 

Four operations that ean be performed on an ordered queue: 

Oq_Init(OQ, N): ereates an ordered queue OQ with N priority levels. 

Oq_Add(OQ, p, I(p)): adds an address of a pixel p having priority I(p) to the 
ordered queue OQ. 

Oq_Get(OQ): returns the address of the highest-priority pixel from the ordered 
queue and removes its address from the queue. 

Oq_Empty(OQ): return TRUE if the ordered queue is empty, FALSE otherwise. 



3.4 Watersheds 

Watersheds algorithm eonsists an initialization phase and a working phase[6]. 

The input is a gray-seale image / and a set of markers M (the regional minimum 
of the image /). These markers serve as the souree for the flooding, these markers ean 
be represented by an marker image J. pixels belong to the marker Mi are assigned the 
label Mi, and non-minima pixels are assigned with the value NRGNMIN. 

(1) Initialization Phase 

An ordered queue is ereated with as many priority levels as the levels of gray 
tones in the image I. All the boundary points of the markers with label Mi in image J 
are entered in the ordered queue. The value of eaeh point in the image / 

determines the priority level in the ordered queue. 

(2) Working phase 

The working phase ean be expressed by the following algorithm: 

#defme WATERSHED 0// Value of water shed 
#define NRGNMIN -1// Not a regional minimum 
#define INITOQ -2//Value of a watershed pixel 
Input I, the original image 
Output J, the watershed image 
While (Oq_Empty(OQ) == FALSE) { 

P ^ Oq_Get(OQ); 

If there is a pixel pl,p2 < Ng(p) sueh that 

J(pl) >0, J(p2)>0,J(pl)!= J(p2){ 

J(p) ^ WATERSHED; 

} 

else 

{ 

J(p) ^ the unique eatehment basin label in Ng(p); 

For eaeh pi ^ Ng(p) { 

If( J(pl) = NRGNMIN) { 




Blob Analysis Using Watershed Transformation 



489 



J(pl)^INIOQ; 

Oq_Add(OQ,pl,I(pl)); 

} 

} 

} 

} 

Where Ng(p) stands for the neighborhood of the pixel p. 



4. Experimental Results 

The teehnique of watershed transformation for blob image segmentation was 
applied to deteet the broken medieine tablets. The image data from a TV eamera was 
640X480 pixels 24 bits of eolor depth, see Fig3(a). There are two broken pills, and all 
the pills are touehing eaeh other. The image was first transferred to a binary image, 
and then proeessed by opening and elosing with a eirele strueturing element to smooth 
the edge of the binary image. From this binary image a distanee funetion was built, 
see Fig3(b).The watershed segmentation results are given in Fig.3(e)-(f). Fig.3(e) is 
the tessellation of the segmented pills, Fig.3(d) shows the eontours, Fig.3(e) is the 
watershed of the inverted distanee funetion, and the final result is shown in Fig. 3(f). 
All the experimental results were made by CPAS, a software based on mathematieal 
morphologieal operations built by the authors. Analysis result data are given in table 1. 
Ml, M2 and M3 are the first three invariant moments [10], S is the size of the pills in 
pixels, C is the eireularity of the tablets. From these data, it is easy to deteet the 
broken tablets. 

The watershed transformation for segmenting overlapping objeets, whieh means 
separating them from eaeh other has been deseribed. It was shown to be a powerful 
tool for eontour deteetion and touehing blob segmentation. To perform this method, 
markers whieh indieating the individual objeets must be extraeted. Ultimate erosion 
was used to mark the objeet to be segmented, but the marker-driven watershed 
segmentation methods do not always yield perfeet results. Firstly, this method only 
euts the touehing objeets, but not reeonstruets them. Seeondly, the result is governed 
by the marker, that is to say, one objeet must has one and only one marker, otherwise 
they will be either over-segmented or not segmented. Thirdly, the method only apply 
to eonvex overlapping objeets segmentation. These problems are under further 
investigating. 



Table 1. Experimental results 





Ml 


M2 


M3 


C 


S 


Average data of 
unbroken pills 


767.72 


488.16 


44.69 


0.88 


4676 


Broken pill a 


513.00 


74059.76 


8045.30 


0.68 


2549 


Broken pill b 


217.17 


5156.00 


1548.90 


0.70 


1206 


Broken pill e 


245.37 


1562.67 


1080.50 


0.71 


1438 





490 Yi Cui and Nan Zhou 



5. Conclusions 




Fig. 3. Broken tablets deteetion by watershed transformation(a)original image; (b) distanee 
funetion,(e) tessellation of the segmented tablets image; (d) eontours of the tablets image (e) 
watershed of the distanee funetion; (f)final result. 



Reference 

1. A. Martelli, Edge deteetion using heuristie seareh methods, CGIP, pp. 169- 182, 
1972. 

2. J. M. Lester, et al. Two graph searehing teehniques for boundary finding in white 
blood eell image, Comput. Biol Med. Vol8, pp.293-308, 1978. 

3. F.Meyer and S. Beueher, Morphologieal segmentation, J. Visual Comm, and 
Image Representation, 1(1), pp.21-46, 1990. 





Blob Analysis Using Watershed Transformation 491 



4. L.Vincent and E.D.Dougherty, Morphological segmentation for textures and 
particles. Digital Image Processing Methods, E. R. Dougherty ed., Marcel 
Dekkerinc., 1994. 

5. S. Beucher and F. Meyer, The Morphological Approach to Segmentation: The 
watershed Transformation, Digital Image Processing Methods, E. R. Dougherty 
ed., Marcel Dekker Inc., 1994. 

6. Bogdan P. Dobrin, et al. Fast watershed algorithms: analysis and extensions, 
SPIE Vol.2180, Nonlinear Image Processing V.pp. 902-920, 1994. 

7. Gunilla Borgefors, Distance transformation in digital image. Computer Vision, 
Graphics, and Image Processing, 34, pp.344-371, 1986. 

8. Gunilla Borgefors, Distance transformation in arbitrary dimensions. Computer 
Vision, Graphics, and Image Processing, 27, pp.321-345, 1984. 

9. Kenneth R. Castleman, Digital Image Processing, Prentice Hall Inc, 1996. 

10. David Vernon, Machine Vision, Prentice Hall International (UK) Ltd. 1991. 

11. Serra J., Image Analysis and Mathematical Morphology, Academic Press, 1983. 




A Novel Fusion of Holistic and Analytical Paradigms 
for the Recognition of Handwritten Address Fields 



Chin Keong Lee and Graham Leedham 

Nanyang Technological University, School of Applied Science 
N4-02a-32 Nanyang Avenue, Singapore 639798 
Travck@yahoo . com 
asgleedham@ntu . edu . sg 



Abstract. A novel scheme of automatic address interpretation for the 
recognition of unconstrained address fields is presented in this paper. This 
hybrid method fuses a holistic paradigm with an analytical approach to 
handwritten word recognition for an identifiable address field, such as building 
number, to reduce the error and rejection rates. The holistic paradigm uses 
fuzzy membership, interclass distance measures in feature selection and 
extraction using the Minkowski metric of order 5=2, dynamic lexicon and linear 
programming techniques. The method was evaluated using a set of 900 binary 
postal images, which contain a mixture of purely cursive and touching discrete 
addresses. Given the image of a handwritten address, our algorithm produced a 
cost-effective delivery point code where 72% of the mail-pieces were correctly 
encoded and 28% were rejected. The error rate was zero on this test set. 



1 Introduction 

The main problems in handwritten address interpretation (AI) are the parsing and 
recognition of a set of correlated entities, such as the postcodes, street names and 
building numbers, in the presence of incomplete information. It is a computer vision 
problem, which has stringent requirements in commercial applications. The task of 
interpreting a handwritten address is made difficult because of the complexity of the 
addresses, the word shape distortion due to non-linear shifting, unpredictable styles of 
writing and the presence of very large lexicons for postcode and street names. 

In this paper, we have applied our ideas and techniques to the specific task of 
identifying and recognising the building number, street name and postcode fields in 
Singapore handwritten addresses. The six-digit postcode system uses the first two- 
digits to represent the sector code that defines a particular area in Singapore, and the 
last four-digit defines a delivery point which can be a house or an office building. 
Two typical examples of Singapore addresses we are attempting to recognise are 
shown in Fig. 1. 

Others have also used a holistic paradigm. For example, Tregidgo and 
Downton [14] and Kabir et al. [4] in a syntactic or structural features recognition 
approach for postcode verification. Mahadevan and Srihari [8] and Mao et al. [9] 
describe research concerning the analytical paradigm of an over-segmentation 
approach using city-state-zip block recognition. A lexicon-driven analytical approach 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 492-501, 2000. 

© Springer-Verlag Berlin Heidelberg 2000 




Holistic and Analytical Paradigms for the Recognition of Handwritten Address Fields 493 



using a chain-coded description of the street name cascaded with an analytical model 
was also set out by Madhvanath et al. [7]. A different approach is described by Nuijt 
and Gerwen [10] using a completely probabilistic model for postcode recognition 
without context knowledge. 











02 ^^ 






<3 ^ ^ 33 ^ 



Fig. 1. Test samples of binary eursive address images 



Unlike these systems, we present an automatic address interpretation that uses the 
coalition of holistic and analytical features for address field recognition. An empirical 
design for automatic address interpretation of Singapore postal addresses is presented 
to encode the mail-piece, given its binary image (at 200dpi) and the Singapore Postal 
Code Database. 

A salient aspect of our system is that it is able to achieve virtually zero error rates 
by using a coalition of syntactic and analytical approaches to address field 
recognition. We used the following features in the syntactic approach: line/word 
positions, numbers of words, characters, ascenders, descenders and loops and the 
ascender/descender sequence in each word. If there is an error in postcode 
recognition, our proposed method attempts to detect and correct the error by using the 
remainder of the address. This fusion method has two advantages. One is that the 
inaccuracy of the holistic features in the syntactic approach is compensated for, and is 
possibly corrected by the analytical approach. The other is that it strengthens 
verification error detection by reconfirming the verified address against the 
recognised building number. 

The method works well in addresses where an identifiable short alphanumeric 
field, such as building number, is correctly located and recognised with a certain 
confidence. If there is any uncertainty in the analytical paradigm, the system will 
accept the decision from the syntactic approach. We applied the analytical paradigm 
onto a specific group of error-identified addresses from a set of training images to 
reduce rejection errors. This is important because not all addresses can be analytically 
processed due to character interference, such as overlapping, touching, connected, and 
intersecting character pairs, which makes the analytical approach vulnerable to errors. 

The work presented in this paper is not constrained to handwritten address 
interpretation, but also relevant to handwriting recognition in general, because many 
of the tools developed for segmentation and recognition are applicable for other 
handwriting domains, such as forms, questionnaires, bank checks and unconstrained 
handwritten documents. 





494 Chin Keong Lee and Graham Leedham 



2 A Word Separation Method Using Sorted-Components 

Connected component labelling [12] is widely used in image processing for 
segmenting target objects. Many operations, such as feature extraction, address 
segmentation and special mark detection rely heavily on connected component 
analysis. In our approach, the 8-adjacent connected components are first sorted 
according to the component’s starting column position in an ascending order. That is, 
the left-most component is labelled with the lowest number as shown in Fig. 2(a). The 
words in the line are grouped based on the inter-component distance along the sorted 
list and verified against the length of each group (Fig. 2(b)). In the case of a multiple 
words due to over-grouping, a split is performed at a point with the largest gap in the 
multiple word group. This process is repeated until the word length satisfies a 
threshold value. The result is each unique label is considered a word (Fig. 2(c)). 





Fig. 2. (a) Sorted components, (b) and (c) Grouping and splitting of words. The labels of the 
components before and after word separation are shown in the boxes. 



The main advantage of our approach to word separation is resource sharing 
through the reuse of the connected components. Hence it is efficient in terms of the 
number of operations compared to applying a separate algorithm, like the horizontal 
and vertical smearing techniques [3]. 



3 The Holistic Paradigm - Rapid Verification of Address Words 

The algorithm is based on the concept that there is a prototype or an ideal element for 
a class, and the degree of membership of each element is directly related to the 
similarity of the element to the ideal or inversely related to its distance from the 
prototype (ideal). 

Let d{X, R) be the distance of a word with feature vector X= [xi, X 2 , ... from 
the prototype vector R = [ri, r 2 , ... r^v]' of a class where d{X, R) > 0. The fuzzy 
membership value of X in the class, which denotes the degree of belonging of X in the 
class, is 



^{X) = g{d{X,R)} 



( 1 ) 





Holistic and Analytical Paradigms for the Recognition of Handwritten Address Fields 495 



where g is a strictly decreasing function from [0,oo) to (0,1]. The form of the 
membership function of this class used in this work is 

fj{X) = [\ + d{X,R)lrn-^ ( 2 ) 

where rj is an arbitrary positive constant which has the effect of altering the fuzziness 
of a set [11]. If we consider r/ to be a weighted Euclidean distance (a special case of 
the Minkowski metric with s=2\ [1]) in Eq. (2) above, its explicit form is 

d{X, R) = [(X- R)' A(X- R)f^ (3) 

where A is assumed to be a symmetric positive definite weight matrix. Hence, Eq. (1) 
becomes 



M(X) = g{[(X-R)'A(X-R)f^} (4) 

Assuming that the functional form of g is known, the parameters to be estimated 
are R and A, based on a set of training samples for each of which the feature vector X 
and the fuzzy membership value a are known. Approximate membership values can 
be obtained from the qualitative levels (e.g. very good, good, bad, etc.) of those 
samples. 

Suppose there are h words from the address image with feature vector Xj = [xy, X2p 
... Xnj ]' and membership values Oj, j = I, 2, ... h. From Eq. (4) we get 

a, = ju(^j) = ^{[(^ - RY (5) 

In this methodology, we considered the case where R is known and ^ is a 
symmetric positive definite matrix of which non-diagonal elements are not 
necessarily zero. That is, pairwise interaction among the features may be present. To 
estimate the N diagonal elements of A, say, i= N and h>N, from Eq. (5) we 
get 

=(Xj-RYA(Xj-R) 

= y'jAYj, where 

= since matrix ^ is diagonal (6) 

i 

where y] = [Yy, Yy, Ymj\, i =1,2,..., and; =1,2,..., h. 

Our aim is to obtain an a = [ui, a 2 ... a}^\' e (A^-dimensional space) such that a 
satisfies all the h equalities. But such a solution vector will not, in general, exist when 
h> N. So we relax the equalities in Eq. (5) into inequalities 

Oj-z < < Oj + z where e>0 (7) 

or ^ {Xj-R)'A{Xj-R) > g-^o^ + z)^ 

If we take (pj{e) = g“* {cxj - e) and Xj{e) = g“* { ccj + e) we have 

<pj(£) > YjAYj > Aj(£) 




496 Chin Keong Lee and Graham Leedham 



or (Pj{£) ^ HYya, > Aj(£), i=l,2,...,N 

i 

We define the solution space S{^ as 

S{£)= [a = [a„...,a„]’e -.(pj{£)> z]a> Xj{e),j = \,...,h\ (8) 

where Zj = ] and S{e) is a closed and convex set. To obtain an estimate 

for at we look for a non-empty S{e). If 5'(0) is non-empty, it is a singleton. Otherwise 
we can choose sufficiently large £ such that S{s) is non-empty. 



4 The Analytical Paradigm in Recognising Alphanumeric Fields 

A building number is found in most Singapore addresses and can be categorised into 
HDB block number or non-HDB building number. The difference between them is 
the presence of the word “BLOCK” which is written before the building number. This 
is usually not included in non-HDB addresses such as apartments, condominiums, 
commercial buildings and walk-up addresses. 

An analytical approach to handwritten word recognition refers to dissecting a word 
into character fragments and uses the character as the unit of word recognition. An 
analytical paradigm, which basically includes character segmentation and character 
classification, is described as follows. Firstly, the location of the building number for 
the non-HDB address is determined from a given image. The building number 
contains alphanumeric characters. The validity of the building number is checked 
using the a priori knowledge of the address fields. Then it is segmented into 
characters and recognised by a neural network OCR algorithm. Using this information 
together with decision-theoretic checks on the verified postcode can further reduce the 
wrongly verified postcodes or wrongly destined addresses. 

The Methodology for Building Number Recognition: - 
7. Building number detection for Non-HDB addresses 

• The first word of every line is checked (usually, the building number is the 
first word in a line). 

• Character segmentation. Non-touching characters are checked against the 
character’s width and the number of objects counted within the word. As the 
majority of people write building numbers discretely, the algorithm deals 
with non-touching characters at this stage. However, interference between 
adjacent letters is common in handwriting and it must be solved in future. 
We used a vertical histogram method to detect and segment the building 
number (Fig. 3). Alternatively, the building number can be dissected by 
decomposition of an image into smaller units [13]. 

2. Recognising the isolated characters in the building number word 

• The initial character is checked. If it is an alphabet or a hash (#) symbol 
exists in the address, the confidence of the word is negated. 




Holistic and Analytical Paradigms for the Recognition of Handwritten Address Fields 497 



• Else, the eonfidenee of the word is the produet of all the top ranked 
eharaeters eonfidenee values. 

• The reeognised building number and its eonfidenee value is reeorded. 








K 

tA HCXA 



Fig. 3. Building number “lOA” detection and segmentation using vertical histogram method 



5 Fusion of Holistic and Analytical Paradigms 

In this paper, the holistie paradigm in automatie address interpretation refers to the 
paradigm of treating the handwritten word as a single, indivisible entity and 
attempting to interpret it using features of the word as whole. This is in opposition to 
the analytieal paradigm, wherein the word is treated as a eolleetion of simpler sub- 
units sueh as eharaeters. The power of the holistie paradigm is greatly diminished in 
sueh seenarios due to the great variability in word shape as well as the propagation 
errors, whieh result from eaeh feature elass extraetion. The eentral issue investigated 
in this paper is the potential role of fusing these two paradigms together with a 
thresholding-control seheme for the address interpretation problem. Our objeetive is 
to reduee both the rejeetion and error rates. 



Input 

Image 




Output 

Postcode 



Fig. 4. Schematic diagram of the fusion strategy 

In the holistie paradigm, eoarse features of the word shape sueh as aseenders, 
deseenders, aseender/deseender sequenee and loops are identified as being effeetive in 
a large and dynamie lexieon. A methodology of heuristie predietion of “ideal” 
features of lexieon words from ASCII and appropriate matehing strategies are 
proposed as a solution to the address interpretation issue [4]. While eoarse holistie 
features are insuffieient for fine distinetion between lexieon entries, they are suffieient 
to identify lexieon entries that are very “different” from the image. This property 
forms the eomerstone of the holistie verifieation and lexieon hole reduetion in our 
previous approaeh [5], [6]. Holistie verifieation is the task of determining whether or 
not a given image of a handwritten address phrase, sueh as street-name, is indeed that 
of a given ASCII phrase. To preeisely identify this phrase or prevent an early-mateh. 








498 Chin Keong Lee and Graham Leedham 



we appeal to the lexieon hole reduetion, that is, the task of inereasing the pereentage 
of the eorreet address present in the lexieon. The bloek diagram of this strategy is 
shown in Fig. 4. 

Algorithm of the thresholding-controh- 
Posteode step funetion 5 is defined as, 

g I ^ ranked valid hypothesis 

[ 0 ; otherwise 



For 6(1), 

/* Checking whether the top ranked valid hypothesis 
is an OCR postcode */ 

if ( Dictionary modified, i.e. NOT OCR postcode ) then 
if ( Cq > ) then 

ACCEPT (postcode) ; 
else 



REJECT (postcode) ; 

f i ; 

else /* Not Dictionary modified */ 

if (Matching_BuildingNumber ( . . ) ) then 
ACCEPT (postcode) ; 
else 

REJECT (postcode) ; 



For 5(0), 

/* Recovery unit */ 
if ( > T^^^ ) then 

ACCEPT 

else 



fi; 



REJECT 



(postcode) ; 
(postcode) ; 



• The thresholds are related as, Tacc < Tacc"^ < T^ec 

• A seeond higher threshold value, Tacc^ is used to avoid wrong elassifieation in the 
ease of high probability of uneertainty in the OCR posteode 

• The reeovery level. Tree is the threshold value (extremely high) used for reeovery 
purposes based on the address similarity. 



For Matehing_BuildingNumber(..), 

If all the rules below are satisfy, then it returns FALSE, otherwise it returns TRUE 



Rule 1 : The confidence value of the building number is greater than a threshold 

value, AND 

Rule 2: The recognised building number does not match with the associated 

hypothesis building number, AND 

Rule 3: For address type “U”, the address score of the top hypothesis is greater 

than a threshold value, OR 

For address type ''S”, the top and second top hypotheses possess equal 
address scores of greater than a threshold value. 
where “U” is Walk-up address and “5^’ is a Standard address based on the Singapore 
Postal Code Database, Issue 1/97. 




Holistic and Analytical Paradigms for the Recognition of Handwritten Address Fields 499 



6 Experiments 

Sample handwritten addresses were eolleeted from a wide and random population. 
Three hundred people were asked to write three addresses on speeial white printed 
eards, whieh had dropout eolour boxes and lines. The printed boxes were used to 
write the posteode field while the rest of the address fields were written in a totally 
uneonstrained manner with horizontal guidelines only. In this way, the posteode digit- 
string segmentation sub-problem ean be handled separately. The nine hundred 
samples were randomly divided into two sets: training set of 450 images, and testing 
set of 450 images. Experiments were performed on the training set to evaluate the 
effieieney of the pre-elassifieation modules, and tested on the testing set. Table 1 
summarises the performanee of the system. Table 2 eompares the performanee of the 
system using five diffemet features and elassifier methods. Table 3 gives the 
pereentage of address images for whieh the eorreet posteode was ranked within the 
top k ehoiees. 

From Table 1, we note that in 72.0% of the eases, we have been able to reeover one 
posteode, whieh was wrongly reeognised by the OCR algorithm with a dietionary 
eheek, using the fusion method. This is an important observation beeause we believe 
that if we extend the analytieal paradigm to a multiple hypothesis seenario, we ean, in 
Riture, reeover more addresses by using the top k entries. From Table 2, the 
performanee of a eonventional method using an OCR algorithm for posteode 
reeognition is not eommereially viable due to its high error rates, even with a 
posteode dietionary eheek. The performanee of the features using holistie or struetural 
word shape of street names and building numbers shown in Method 3 in Table 2 is 
outstanding in terms of the error rates of 0.6% or 3 errors. The profieieney of eaeh 
feature seleeted and extraeted is relatively aeeurate and the weakest performing 
feature among the features explored is the number of loops. This is due to the 
inaeeuraey in deteeting broken loops (the algorithm for broken loop deteetion is based 
on mathematieal morphology, [2]) as shown in Fig. 5. 

The eonsequenee is that the eorreet address is matehed with an address seore of 
lower than a spurious address from the lexieon. Usually these eases are rejeeted in the 
deeision-making module. As for the fine elassifier system using eombined holistie 
and analytieal features, shown in Method 4 in Table 2, the fusion method is the best 
ehoiee. In the view of error rates, it has the best performanee- no errors. Furthermore, 
the fusion method also improved the reeognition rate by 7.4% over the holistie 
method with a low rejeetion rate of 28.0%. However, it requires higher CPU 
resourees to segment and reeognise the building number analytieally. The eurrent 
speed of our system is around 15 seeonds per address running on a 300 MHz Sun 
UltraSpare-II maehine. Within this time our system performs posteode extraetion and 
reeognition, line and word segmentation, multiple hypothesis generation and ranking, 
features extraetion, address database searehing, syntax matehing and analytieal word 
reeognition. From Table 3, it seores a high Top 10 seore of 86.4% showing its upper 
boundary. 




500 Chin Keong Lee and Graham Leedham 



Table 1. Summary of the performanee on the testing set 



Rates 


Exp. #1 


Exp. #2 


Exp. #3 


Exp. #4 


Exp. #5 


Recognition 


63.1% (284) 


65.3% (294) 


70.9% (319) 


71.1% (320) 


72.0% (324) 


Rejection 


36.9% (166) 


34.7% (156) 


29.1% (131) 


28.9% (130) 


28.0% (126) 


Error 


0.0% (0) 


0.0% (0) 


0.0% (0) 


0.0% (0) 


0.0% (0) 



Table 2. Comparison of the Performanee before applying the Fusion Method on the testing set 



Methods 


Recognition Rate 


Rejection Rate 


Error Rate 


1 . Postcode recognition (OCR alone) 


80.4% (362) 


7.8% (35) 


11.7% (53) 


2. OCR + postcode dictionary check 


84.9% (382) 


7.1% (32) 


8.0% (36) 


3. Holistic Features 


69.8% (314) 


29.6% (133) 


0.6% (3) 


4. Fusion of Holistic & Analytical Features 


72.0% (324) 


28.0% (126) 


0.0% (0) 



Table 3. Pereentage of samples for whieh the eorreet posteode was ranked within the top k 
ehoiees 





Topi 


Top 2 


Top 3 


Top 4 


Top 5 


Top 6 


Top 10 


Fusion Method 


73.7% 


77.7% 


80.9% 


82.2% 


83.6% 


85.3% 


86.4% 




Zo — ] 



Fig. 5. Badly handwritten “Keng Lee Road”. The loops are undeteetable. 



In summary, among the methods explored, the eombined system utilising both the 
holistie and analytieal paradigms for a totally uneonstrained address fields reeognition 
aehieved a good aeeeptanee rate of 72.0%. 



7 Conclusions 

The objeetive of any automatie address interpretation system is to minimise a eertain 
eost, sueh as error rates and rejeetion rates. This paper proposed a method using 
eoalition of holistie features and analytie word reeognition of an identifiable address 
field, sueh as the building number, whieh is eapable of produeing highly eost effeetive 
delivery point eode for a destination address. In our experiment, 72.0% of the 
posteodes were eorreetly eneoded and none of the address images in the training set 
and testing set were wrongly reeognised by the system. Performanee is illustrated to 
be reasonable and viable for Singapore postal applieation with further improvements 
in reeognition module performanee and in utilisation of eontext. Further evaluation on 
a larger database is needed to fully verify the performanee figures. Currently, only the 
first ehoiee of the addresses was eheek against the building number for a eonditional 
mateh with the aim to reduee errors. It is possible to further improve the performanee 
by eonsidering a number of ehoiees of ranked addresses, as the probability of having 
the eorreet address being reeognised are higher. 








Holistic and Analytical Paradigms for the Recognition of Handwritten Address Fields 501 



References 

1. Devijver, P. A. and Kittler, J. (1982). Pattern Reeognition: A statistieal approaeh, 
Englewood Cliffs, N.J.: Prentiee/Hall International, 1982. ISBN: 0136542360. 

2. Haraliek, R. M., Sternberg, S. R., Zhuang, X. (1987). Image analysis using mathematieal 
morphology. IEEE Trans, on PAMI, 9(4), 532-550. 

3. Hendrawan (1994). Reeognition and Verifieation of Handwritten Postal Addresses. PhD 
thesis. University of Essex. 

4. Kabir, E., Downton, A. C., Bireh, R. (1990). Reeognition and verifieation of posteode in 
handwritten and hand-printed addresses. Proc. 10^^ Int. Conf. on Pattern Recog., 1, 469- 
473, Atlantie City, USA. 

5. Lee, C. K. and Leedham, G. (1998). Automatie sorting of hand-written Singapore postal 
addresses using address knowledge. Proc. 5^^ Int. Conf. on Control, Automation, Robotics 
and Vision, 2, 893-897, Singapore. 

6. Lee, C. K. and Leedham, G. (1999). Empirieal design of a holistie verifier for automatie 
sorting of handwritten Singapore postal addresses. Proc. 5^^ Int. Conf. on Document 
Analysis and Recognition, 733-736, Bangalore, India. 

7. Madhvanath, S., Kleinberg, E., Govindaraju, V. (1997). Empirieal design of a multi- 
elassifier thresholding/eontrol strategy for reeognition of hand-written street names. Int. 
Journal of Pattern Recognition and Artificial Intelligence, 11(6), 933-946. 

8. Mahadevan, U. and Srihari, S. N. (1999). Parsing and reeognition of eity, state, and ZIP 
eodes in handwritten addresses. Proc. 5^^ Int. Conf. on Doc. Analysis and Recog., 325-328, 
Bangalore, India. 

9. Mao, J., Sinha, P. and Mohiuddin, K. (1998). A system for eursive handwritten address 
reeognition. Proc. Int. Conf. on Pattern Recognition, 2, 1285-1287, Brisbane, 
Australia. 

10. Nuijt, Martijn. R. and Gerwen, Emile van (1999). A probabilistie model for posteode 
reeognition: a first-step towards probabilistie address interpretation,” Proc. 5^^ Int. Conf. 
on Document Analysis and Recognition, 325-328, Bangalore, India. 

11. Pal, S. K. and Majumder, D. D. (1977). Fuzzy sets and deeision-making approaehes in 
vowel and speaker reeognition. IEEE Trans, on Systems, Man and Cybernetics, 7(8), 625- 
629. 

12. Rosenfeld, A. and Pfaltz, J. L. (1966). Sequential operations in digital image proeessing. 
Journal of the ACM, 13(4), 471-494. 

13. Shridhar, M. and Badreldin, A. (1987). Context-direeted segmentation algorithm for 
handwritten numeral strings. Image and Vision Computing, 5(1), 3-9. 

14. Tregidgo, R. W. S. and Downton, A. C. (1991). High performanee eharaeter reeognition 
for off-line handwritten British posteodes. lEE Colloquium on Binary Image Processing - 
Techniques and Applications , 6/1-6/5. 




Pawian - A Pairallel Image Recognition System 



Oliver Hempel, Ulrich Biiker, and Georg Hartmann 



Heinz Nixdorf Institute, University of Paderborn 
33095 Paderborn, Germany 

hempel@get . uni -paderborn . de 



Abstract. The Pawian* system is a knowledge-based image recognition 
system that uses parallel programming techniques on several system lay- 
ers to reduce the recognition time factor. The high robustness provided 
by knowledge-based recognition is supported by data- and model-driven 
parallelism. Special search techniques and heuristics provide a high per- 
formance and good time reduction. 

Keywords: Computer Vision, Distributed Problem Solving, Heuristie 
Searching, Knowledge Representation, Model-based Reasoning, Robotics 



1 Introduction 

Image recognition is an important component of today’s industrial production. 
Many different applications from quality control to industrial manufacturing are 
implemented all over the world. Also, many research groups are working in the 
image recognition field to improve and increase the recognition performance. 
Therefore, the main interest is to move the performance of this discipline - be- 
longing to Artificial Intelligence - closer to human abilities, to obtain high fiexibil- 
ity (recognition of partially occluded or noisy objects), fast recognition even with 
smaller systems (workstations without special acceleration hardware), or exten- 
sive information extraction. The Computer Vision Laboratory at the University 
of Paderborn does research on several aspects of image recognition [4] . One topic 
is the use of parallel computing to increase the recognition performance. The par- 
allel knowledge-based image recognition system Pawian presented in this paper 
combines the results of a number of years’ research in this field. 

In the following section, we discuss why it is obvious to apply parallel process- 
ing for computer vision and image recognition. Section 3 describes the recogni- 
tion techniques used in the Pawian system. The parallelization of the recognition 
process is presented in the fourth section. Some examples and results are given 
in section 5. 

* Pawian is an acronym of the German term Parallele Wissensbasierte Analyse 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 502-512, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



Pawian - A Parallel Image Recognition System 503 



2 Reasons for Parallel Image Recognition 

The use of parallel programming techniques for low-level image operations is 
quite normal [8]. An image as a matrix of pixels is very suitable to be divided 
into several areas that can be processed independently from each other. On the 
other hand, the field of high-level image recognition is rarely covered by using 
parallel techniques. However, there are several reasons to apply the paradigm of 
parallel processing for image recognition purpose also. 



2.1 The Time Factor 

Of course, efficiency is the main reason for parallel computing valid for nearly 
every application. The availability of small multiprocessor systems makes parallel 
processing also possible for smaller tasks. Besides, networked workstations and 
PCs have come up for industrial applications during previous years and build 
now a working environment for engineers. However, in knowledge-based image 
recognition the run time plays an important role even beside the pure hardware 
aspect [3]. 

In general, computer vision systems get their input data from one or more 
optical sensors. These sensors as well as all components moving the sensors in 
the environment (e.g. motors, robots) are resources that can only be used by one 
single process at a time. Actually, this seems to be an obstacle for concurrently 
working processes. However, at the time a process analyses the data of a sensor 
another process can use this sensor, move it to another point of interest and get 
new input data. Consequently, we obtain a pipeline effect. 

A second aspect for reducing the time factor is to obtain the most important 
information as fast as possible to concentrate the resources on the right query 
point. E.g. position estimation is a very important aspect in recognizing spatial 
objects. Instead of one process evaluating one possible position after the other, 
several processes, one for each possible characteristic orientation, run through 
the position estimating algorithm at a time. 



2.2 Vision as a Parallel Process 

There are some more reasons to use parallel processing especially for computer 
vision purposes. Some vision researchers consider the human brain consisting of 
several agents that compete for the best interpretation of what the eyes have 
detected. Others explain the superiority of a human against a machine e.g. in 
playing board games in the humans ability to survey the whole game scenario [7]. 
Thus, humans act as parallel processors and it is obvious to study the possibilities 
of parallel programming for the improvement of image recognition performance. 

3 The Pawian System 

The Pawian system is a knowledge-based image recognition system that uses 
parallel programming techniques on several system layers to reduce the recog- 



504 Oliver Hempel et al. 



DATA 




a) TOOLS b) 

Fig. 1. Main components of the Pawian system (a) and elements of the control 
algorithm (b) 




nition time factor. This chapter presents the basic methods used to realize the 
recognition whereas the next chapter sets up its priorities on the parallel aspect. 

Fig.l a) shows the main components of the Pawian system. The central 
part of the system is the parallel control algorithm. The parallelism is realized 
by multithreading techniques. On the data input side the a priori knowledge 
about an object is stored in the internal knowledge base. The external data 
input is taken from the system environment by the presented scene. The task 
pool is necessary for the parallel realization of the control mechanism. It is 
described in detail in section 4.2. The camera robot-unit emphasizes the active 
vision character of the system. 

The task is to recognize one or several objects in the presented scene. The 
control algorithm starts generating recognition tasks from the knowledge base 
and deposits them in a task pool for distribution purposes. The first tasks are 
typically using further system tools to acquire environment data. Therefore, a 
robot-guided camera grabs images from a presented scene. The extracted envi- 
ronment data combined with the descriptions in the knowledge base are used for 
further task generation. The recognition task is solved when the objects in the 
presented environment have been recognized. If it is not possible to recognize an 
object at first glance, the robot moves to significant sub-structures of the object. 
These sub-structures are also described in the knowledge base. The recognition 
process concludes with a negative result, if it is not possible to recognize an 
object by using all a priori knowledge. 



3.1 The Camera Robot-Unit 

Images of the environment are taken by a camera which is mounted beside the 
hand of a six-degree-of- freedom robot arm. The use of a robot arm for camera 
motion (active robot vision) enables the focussing of single sub-structures of a 
scene from different viewpoints. Thus, even three-dimensional objects can be 
recognized. 




Pawian - A Parallel Image Recognition System 505 




Fig. 2. Concept model (a) and generation of a search tree from a semantic 
network (b) 



3.2 The Knowledge Base 

The knowledge base consists of object models and learned prototypes. The pro- 
totypes are a collection of features of the concerned objects. At this level, a 
tolerant contour representation is used to obtain a high separability indepen- 
dently from size and orientation of an object. All information concerning an 
object the recognition system needs is given by an object model. Semantic net- 
works are a common paradigm for knowledge representation [2]. Semantic nets 
are linked graphs consisting of nodes and labeled edges. The use of hierarchically 
ordered networks enables us to create object models that become more and more 
detailed through lower network layers. 

The complex nodes of a semantic network, the concepts, describe objects 
by using one or more attributes. The general model of a concept is shown in 
Fig. 2 a). Each attribute has an operation that delivers an attribute value. Thus, 
even procedural elements can be included into the object model. The distinction 
between pre-attributes and attributes enables the execution of different oper- 
ations in two steps: expansion of the search tree (top-down) and instantiation 
of concepts (bottom-up). The pre-attributes are used to recognize objects or 
their parts holistically with neurobiologically motivated operations. Attributes 
mainly describe topological characteristics of sub-structures. If the attribute val- 
ues correspond to the model, an instance of the concept is created. The nodes of 
a semantic network are connected by different edges which carry the semantic 
relations of the connected concepts. 



3.3 The Control Algorithm 

The structure of the control algorithm is shown in Fig.l b). Several modules that 
can be exchanged for different applications are linked to the inference mechanism 
in the center. Prototypes and object models are the variable system input from 
the knowledge base. The operation module consists of a set of image processing 
operations e.g. for image grabbing, image processing, and robot control. The first 
step in parallel image recognition is to use parallel image processing operations. 
The arrangement of operations depends on the application. The frame grabber 



506 Oliver Hempel et al. 



as a hardware module is used by the image grabbing operations. The heuristic 
module connects the inference mechanism with the task pool. Its purpose is to 
choose arbitrary tasks from the pool for the inference machine and to reduce the 
search space by removing unsuitable nodes from the pool. A description of the 
heuristic module is given in section 4.4. 



Inference In a first step the inference machine applies the recognition sequence 
to recognize objects or sub-structures of objects as a whole (holistic recognition). 
For this reason the semantic network of the object model expands in a top-down 
direction. If the holistic recognition of an object fails, the next lower level of the 
network describing the sub-structures of the object is evaluated. If this succeeds 
or if the lowest layer of the semantic network is reached, the recognized sub- 
structures are composed to the higher object by using the semantic information 
of the network (decompositoric instantiation). Therefore, the inference machine 
traverses the semantic network in a bottom-up direction. This behavior of the 
inference machine enables the recognition system to recognize even complex or 
partially occluded objects and to distinguish similar objects [4]. 

Search Tree Obviously, the composition of detected segments to concepts of 
the object model is a search problem. Fig. 2 b) shows how the control algorithm 
generates a search tree from a semantic network. Pre-attributes (1,2, 3, 4) and 
attributes (5, 6, 7, 8) of the concepts (A,B,C,D,E) are passed through, one after 
the other. In this order they are linked to corresponding search tree nodes. The 
order in which the concepts are evaluated depends on the hierarchical structure 
of the semantic network. The pre-attributes are linked to the search tree during 
the expansion phase, the attributes during the concept instantiation. Branches 
in the search tree occur, if several operands are available for one operation. This 
happens, if either the scene consists of several objects or one object consists of 
several sub-structures. A parallel control algorithm treats each newly created 
path in the search tree as a separate task. Hence, several possible solution paths 
are evaluated and the path with the highest judgment delivers the best result. 

4 Parallel Image Recognition 

4.1 Parallel Control 

To realize a parallel image recognition process, a suitable parallel control algo- 
rithm is implemented. The control algorithm of the Pawian system uses data- 
and model-driven parallelism. The data-driven parallelism is a method that can 
be found in several other applications [1]. It is generated by the input data 
given by the presented scene. In the Pawian system, the distinction between 
two levels influences the degree of data-driven concurrence. At scene level the 
segmentation of an image delivers the number of objects and their location in 
the scene. Each of these objects can be processed by autonomous processes. At 
sub-object level the number and location of all visible sub-structures of each 



Pawian - A Parallel Image Recognition System 507 



object segmented at scene level is delivered. These two levels also appear in the 
hierarchical object model, which leads to the model-driven aspect. It is based on 
the matter of fact that we do not only have a data-driven component on each of 
the above described levels. Independently from the number of segmented objects 
or sub-structures, all parts of a scene and all sub-structures of an object modeled 
in the semantic net can be processed concurrently. This new approach based on 
the object model was introduced in detail in [5]. 

Data-Driven Parallelism On the way to the solution of a search, data-driven 
alternatives occur when several objects are offered by the input data to fit to 
the search pattern. In the search tree this behavior is reflected in splitting into 
several branches corresponding to the number of alternatives (e.g. in Fig. 2 b) 
at the nodes 4-1 and 4-2 respectively 6-1 and 6-2). Besides the evaluation of 
the semantic network, to realize a data-driven parallelism the parallel control 
algorithm has to generate the data-driven branches, distribute the arising sub- 
tasks, and put the results together. Also, to obtain a high degree of efficiency the 
following tasks have to be considered in addition to the pure parallel algorithm: 

— Avoidance of multiple evaluations. The nodes of our search trees consist 
of complex operations like feature extraction or classification that might take 
a while to be processed. If several branches of a search tree get the same 
input data, it must be ensured that each element of the presented scene is 
evaluated only once. The result must be available subsequently within the 
entire search tree. 

— Minimization of wait states. If multiple evaluations are avoided, possibly 
several threads must wait for the result of an operation which is evaluated 
by another thread. Mutual blockades (deadlocks) must be avoided and wait 
states have to be minimized. 

— Consideration of common resources. Robot, camera, and data struc- 
tures in the shared memory are resources, jointly used by all threads. The 
access to these resources must be sequent ialized suitably through semaphores 
to prevent collisions. 

— Distribution of the tasks. Tasks which cannot be executed directly are 
collected in the task pool. There, they can be organized according to suitable 
criteria and assigned to any available threads. Wait states can already be 
avoided by suitable selections. 

— Limitation of the number of threads. To take advantage of the available 
number of processors, the number of active threads should be adapted to the 
number of processors. 

— Termination of sub-branches. To avoid unnecessary calculations and 
wait states most effectively, the control algorithm should also be able to 
decide, which branches of the search tree can be pruned early and which 
branches should be processed with high priority. 

Model-Driven Parallelism The reason for an additional model-driven paral- 
lelism is to improve the system performance by obtaining decisive information 



508 Oliver Hempel et al. 




Fig. 3. Generation of a modified search tree from a sample network (a) and 
transformation of the modified search tree to a regular one (b) 



early. An exact position reasoning of an object can often be done after the recog- 
nition of single sub-structures. In this case the control algorithm can concentrate 
its search activities on a specific area of the search tree. Thus, a broad search 
tree can be reduced to a narrow one very early. 

The main idea of model-driven parallelism is to use the information given 
by the object model to generate temporary branches in a modified search tree. 
Whenever a concept splits into several sub-concepts corresponding sub-branches 
are generated in the search tree and the evaluation of these sub-concepts can be 
done in parallel. This way, a modified search tree is created, much broader than 
the regular one but not as deep as necessary to reach a solution. Consequently, 
this modified tree has to be converted into a regular tree. 

Fig. 3 shows the evaluation of the search tree in two steps: the first step covers 
the generation and concurrent evaluation of branches in the modified search 
tree. In the second step, the modified search tree is transformed into the regular 
search tree. Therefore, temporary edges (from node 3 to node 6) are removed 
and sub-trees (beginning with node 4 respectively 6) are connected in the right 
order. Afterwards, the generation of the search tree is finished by evaluating the 
remaining attributes of higher concepts in each branch of the search tree. This 
new method enables a combined data- and model-driven parallelism. 



4.2 Task Pool 

An important component of the parallel control algorithm is the task pool. Actu- 
ally, the task pool is a blackboard-like structure. Here, threads receive search tree 
nodes to continue their work. Also, when a node splits into several sub-branches, 
the new generated nodes are put into the task pool. The division of the task pool 
into several sections enables the control algorithm to control the inference mech- 
anism by setting different priorities to different search tree nodes. Criteria for 
the classification are e.g. type of operation (use of exclusive resources) or quality 
of the node (evaluated by belief functions). 



Pawian - A Parallel Image Recognition System 



509 



resource 2 





robot 


camera 


memory 


robot 


not possible 


grab image 


not existing 


camera 


not existing 


not possible 


save image 


memory 


store operation results 



Table 1. Combination of different exclusive resources 



4.3 Avoidance of Deadlocks 

Deadlocks occur if a process possesses an exclusive resource and asks for a second 
one which is owned by another process, which itself is requesting the resource 
of the first process. To avoid such blockades, it is necessary to analyse possible 
combinations of system resources. Table 1 shows a simplified combination table 
of the resources used in the Pawian system. Some combinations are not possible, 
others do not occur. A frequent combination is the consecutive use of robot and 
camera for moving the camera to a specific position before taking an image. This 
combination can easily be protected by using semaphores that guarantee that 
the camera can only be used by the process that also possesses the robot. 

A more complex protection area is the management of calculated results 
in a meta memory for later usage. The purpose of the meta memory is to 
make results of attribute calculations evaluated by one thread available to other 
threads (see 4.1). Simultaneous inquiries to the meta memory without blockades 
or blocked waiting are realized by the algorithm shown in Fig. 4. A thread that 
wants to get a result from the meta memory obtains the allowance to use the 
meta memory by allocating a unique semaphore. A state value characterizes the 
state of calculation of the wanted result. If the result is yet not calculated (state 
value = 0) the current thread leaves the meta inquiry, releases the semaphore 
and starts calculating the result. The state value is increased. Thus, the next 
thread that enters the meta memory possessing the semaphore is led to the 
second section of the switch construction (state value = 1). At this point, the 
thread releases the semaphore and suspends because the result is not calculated 
completely. As soon as the result has been submitted all suspended and waiting 
threads are resumed by the result calculating thread (Fig. 4 , algorithm on the 
right). The first thread that gets the semaphore is allowed to read the result 
from the meta memory. Threads that try to get a result after its calculation are 
led straight to the last section of the switch construct, provided that they have 
got the semaphore at the beginning. 



4.4 Heuristics 

The striking characteristic of parallel search is the existence of several search 
paths at the same time. Whereas sequential search algorithms try to find the 
optimum path by traversing a search tree following specific rules or heuristics, the 



510 



Oliver Hemp el et al. 



// Meta Memory Access 
get .semaphore ; 
openjnetajnemory ; 

get .St at e_value_of .suit able .entry ; 
switch { 

state.value =0: // result not calculated 

set .state .value = 1; 
break; 

state.value =1: // result not ready 

release.semaphore ; 
suspend.process ; 
get .semaphore ; 

state.value =2: // result ready 

get jresult ; 

} 

release.semaphore ; 



// Calculate Meta Result 
state.value = 0; 

get .semaphore ; 
increase.state.value ; 
release.state.value ; 

calculate jresult ; 

get .semaphore ; 
increase.state.value ; 
release.semaphore ; 

resume.suspended.processes ; 



Fig. 4. Meta memory access without blocked waiting or deadlocks. Access al- 
gorithm (left) and calculation of result (right) possibly executed by different 
processes at the same time 



idea of parallel search is that a number of several possible solutions of different 
quality is found at the same time. Since even the assignment of a few labels to a 
few objects generate a broad search tree, the search space reduction is a crucial 
point in parallel search. 

In the Pawian system, the base for reducing the search space is given by 
implemented rules. In addition, once generated paths have to be pruned at run 
time by judging the quality of a path. Since the model-driven parallelism does 
not allow an early search tree pruning, a suitable alternative was developed by 
copying only the best branches during the transformation from the modified to 
the regular search tree. The benefit of this approach is the search space reduction 
not only in case of successful recognition but even in case of failed recognition. A 
detailed description of the applied search space reduction methods can be found 
in [6], 



5 Examples and Results 

The Pawian system was implemented and tested on a Sun SparclOOO with 
four 50Mhz processors. To visualize the parallel process the diagram in Fig. 5 
a) has been evaluated. The presented scene consisted of one object with 7 sub- 
structures. This object was also modeled in the semantic network and repre- 
sented by the used prototypes. To emphasize the high-level parallelism, low-level 
operations have been executed sequentially for these measurements. 

The main program thread starts the inference process by creating a second 
thread and suspends itself (Fig. 5, phase 1). The transition between the sequential 
and the parallel area of the recognition process emphasizes the two levels men- 
tioned in section 4.1. The diagram shows the gap between the image grabbing 
of the object and the creation of new threads following the result of the segmen- 
tation (2). At this time, three further threads have been generated. Therefore, 
the maximum number of threads during the recognition is four, following the 



Pawian - A Parallel Image Recognition System 511 



Wall Clock Time 




Scene No. 


1 


2 


Objects 


1 


3 


Sub-struct. 


7 


13 


Seq. System 


83 sec. 


180 sec. 


Pawian 


73 sec. 


136 sec. 


Reduction 


12% 


24% 



Fig. 5. System behavior (a) and run time measurement (b) 



number of processors. The subsequent synchronization wait times (3) can be 
assigned to the moving to and grabbing of the 7 sub-structures of the object. 
Afterwards, wait times occur during the transformation of the search tree. At 
the end of the recognition (4), the main program thread resumes and finishes 
the program. 

The measurement in Fig. 5 b) compares the Pawian system with a sequential 
system that uses the same recognition stategies. Also, the recognition probabil- 
ity in both systems is the same. Two scenes with a different number of objects 
and sub-structures have been evaluated. Partially occluded objects inffuence 
the system performance regarding the number of sub-structures. The time re- 
duction caused by the PAWIAN system rises with the number of objects and 
sub-structures in this example up to 24%. 



6 Conclusion and Outlook 

We have presented the parallel image recognition system Pawian. Besides par- 
allel image processing operations, this system makes a parallel inference machine 
available which is realized by data- and model-driven parallelism. The develop- 
ment meets industrial requirements to assembly cells but can also be used for 
a wide range of recognition tasks. High benefit of the parallel system control 
can be achieved especially on complexe scenes. In further research the Pawian 
system will be improved by expanding from a multithreaded system to a multi- 
agent system where intelligent software agents decide what sub-task should be 
done next and communicate and cooperate with each other. 



References 

1. L. Bic. Data-driven processing of semantic nets. In Kowalik, editor. Parallel Com- 
putation and Computers for AL Kluwer Academic Publishers, 1988. 506 

2. N.V. Findler. Associative Networks: Representation and the Use of Knowledge by 
Computers. Academic Press, New York, 1979. 505 

3. V. Fischer and H. Niemann. Parallelism in a semantic network for image understand- 
ing. In Bode and Dal Gin, editors, LNCS, volume 732, Berlin, 1993. Springer- Verlag. 
503 



512 Oliver Hemp el et al. 



4. G. Hartmann, U. Bilker, and S. Driie. A hybrid neuro-artificial intelligence archi- 
tecture. In Jahne, Haufiecker, and Geifiler, editors. Handbook of Computer Vision 
and Applications, volume 3, pages 153-196. Academic Press, San Diego, 1999. 502, 
506 

5. O. Hempel and U. Bilker. A parallel control algorithm for hybrid image recognition. 
In Pan, Akl, and Li, editors. Parallel and Distributed Computing and Systems, pages 
206-209, Anaheim, 1998. lASTED/Acta Press. 507 

6. O. Hempel, G. Schriek, and U. Bilker. Heuristics in parallel search for knowledge- 
based image recognition. In Hamza, editor. Artificial Intelligence and Soft Comput- 
ing, pages 160-164, Anaheim, 1999. lASTED/Acta Press. 510 

7. E. Rich. Artificial Intelligence. McGraw-Hill, New York, 1983. 503 

8. A. Saoudi, M. Nivat, and P. Wang. Parallel Image Processing. World Scientific 
Publishers, Singapore, 1992. 503 



An Automatic Configuration System for Handwriting 
Recognition Problems 



Cara O’Boyle, Barry Smyth and Franz Geiselbrechtinger 



Department of Computer Seienee 
University College Dublin, Belfield, Dublin 4, Ireland. 

{ cara . oboyle , barry . smyth, f ranz }@ucd . ie 



Abstract. Handwriting reeognition (HR) has always been a ehallenging 
problem for the Artifieial Intelligenee eommunity, and remains an open issue. 
Given the eomplexity of the HR task (different writing and eharaeter styles, and 
writing eonditions) it is perhaps not surprising that most of the sueeess has 
stemmed from the development of task-speeifie HR systems. In this paper, we 
deseribe the Seribble system for automatieally eonfiguring HR systems (from a 
library of basie eomponents) for well defined form-reeognition tasks. The 
Seribble system is novel in that it integrates form design software with an 
automatie HR eonfiguration system. The form designer not only allows a user 
to design a form, but also provides a means of eapturing semantie information 
about the form’s fields (type, loeation, input eonstraints ete.), whieh it uses to 
guide the eonfiguration proeess. The result is a speeialised HR system that has 
been eustomised for a partieular form. 

Keywords: Handwriting Reeognition; Automatie Configuration; Expert 
Systems 

Acknowledgements: We gratefully aeknowledge the support of Enterprise 
Ireland in this researeh projeet under the Basie Researeh Grants seheme - grant 
no. SC/1997/620 



1. Introduction 

Effective handwriting recognition (HR) has always been an important but distant goal 
for Artificial Intelligence (AI) research. This is primarily due to the complexity of the 
task - a truly effective HR system must cope with a vast diversity of writing and 
character styles as well as a range of writing conditions. The traditional approach has 
been to build highly specialised systems capable of coping with a narrow type of HR 
task (e.g., postal addresses[6],[7] or bank cheques[l],[2]). However, as these systems 
are developed manually by HR experts, they are expensive and time consuming to 
develop. Recently, researchers have had some success in developing more general HR 
systems (suitable to recognise a wider range of HR tasks), but these systems suffer 
from reduced recognition reliability. 

This paper describes an alternative approach to HR taken in the Scribble project, 
which focuses on form-filling HR tasks. Scribble automatically configures HR 
solutions for forms using a description of the form and a description of components in 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 512-521, 2000. 
© Springer-Verlag Berlin Heidelberg 2000 




An Automatic Configuration System for Handwriting Recognition Problems 513 



a component library. A HR system consists of a number of different components each 
of which has a particular function (e.g. recognition, segmentation) and is restricted to 
a particular type of HR task (e.g. a recognition component could be restricted to 
recognising numbers). Scribble is novel in that it captures descriptions of the fields of 
a form at the time the form is being designed. Different form-fields are designed to 
hold different types of information (e.g. dates, digits, free text). The descriptions of 
the fields are used to select an appropriate set of components from a HR component 
library, to recognise each field of the form. 

The remainder of this paper describes the Scribble system in more detail. The next 
section describes the basic Scribble architecture, and Section 3 looks at how 
Scribble’s domain knowledge (form and HR component knowledge) is represented 
and acquired. Section 4 describes the expert system that performs the final HR 
configuration process. Finally, Section 5 outlines a brief example of Scribble’s 
configuration procedure for a given recognition task. 



2. The Scribble Approach and System Architecture 

Traditionally the stages of a HR system can be divided into several phases as follows: 
preprocessing; feature extraction; segmentation; recognition; and post-processing. For 
more information on these phases see [4]. Most approaches to HR select and fine tune 
specialised algorithms for each stage according to a specific set of task requirements. 

A form recognition system would normally either be general enough to recognise 
every field of a form (with lower accuracy rates), or be manually designed at great 
expense by a handwriting recognition expert to recognise a very specific form (but 
achieve higher recognition rates). For a small company, which has multiple forms that 
change on a regular basis, the latter of these two approaches would be a very costly 
and impractical solution, and therefore for these companies it would be better to use 
the more general systems. The goal of Scribble is to automatically construct 
specialised systems for forms based on a form description and description of the 
components in a HR component library. What is different about Scribble is that it 
obtains knowledge about the individual fields of the form at the time the form is being 
designed. These descriptions are then used by Scribble to automatically construct 
recognition systems configured to recognise each individual field of the form, in an 
attempt to achieve better recognition rates without increasing the cost of constructing 
such systems. 

The Scribble architecture is composed of two main processing components, the 
Form Designer and the Configurer (see Figure 1). The final Scribble output includes a 
form (to be printed out and filled in as usual) and a recognition system whose 
operation has been tailored for this form. 

The Form Designer is a tool that allows a user to design and represent a new 
form - it includes the usual drag-and-drop functionality and an extensive palette of 
common form components, including different types of fields, tick-boxes, labels, etc. 
Importantly, during form design each form-field will be associated with a set of 
properties that will be used to drive the configuration process for this field. The form 




514 Cara O’Boyle et al. 



designer output includes the final form plus a form description, that is, the complete 
set of field properties assigned to this form. 

The Configurer uses a form description to automatically assemble a HR sub- 
system for each form-field by drawing on a library of HR components. These 
components are described according to the operational conditions of each component. 
The configuration task involves the selection of a set of HR components that can 
achieve the recognition goals associated with each form-field. 




As a simple example, consider a basic expense-claim form. The user designs this 
form using the form designer, adding an employee name field, a date field, an expense 
total field and perhaps a description field. During the design each field is described 
using a set of field properties. For example the employee name field is described as a 
single-line text field, and is associated with the employee database, whereas the total 
expense field is described as a single-line digit field, for holding a monetary amount. 
During configuration, various feature extraction, segmentation, and recognition 
components are selected and configured for each field. For instance, for the total field, 
the Configurer would suggest using a digit segmentor (no line segmentor is necessary 
for a single-line field), and a digit recogniser with high accuracy because the field 
contains a monetary amount. 



3. Domain Knowledge Representation & Acquisition 

In order to automatically construct HR systems for form recognition tasks we need 
two types of knowledge; knowledge about the form in question and knowledge about 
the HR components that are available. Examples of form knowledge include, the 
location of the fields to be recognised, whether these fields contain numbers or 
characters, whether the handwriting is expected to be disjoint or cursive, whether the 
writing is restricted to some known dictionary of words, etc. Examples of component 
knowledge include the type of component (e.g. pre-processing, feature extraction, 
segmentation, recognition, post-processing), its input and output characteristics, and 
its performance characteristics (e.g. speed, reliability etc.). The following sections 
describe the form and component knowledge in more detail. We also describe how 
Scribble’s form design tool doubles up as a novel domain knowledge acquisition tool 
for acquiring and encoding form and component knowledge. 



An Automatic Configuration System for Handwriting Recognition Problems 515 



3.1 Form Knowledge 

A given form is represented as a set of fields. Each form field is described using a set 
of pre-defmed features defined in a form markup language (FML) that has been 
designed with handwriting recognition in mind. Specifically, the following features 
are available: 

• Location: Field co-ordinates plus width and height data. 

• Category: The expected character type of the field (e.g. alpha, numeric). 

• Text Type: Details about the type of text for the field (e.g. cursive, disjoint). 

• Line, Word, and Character Properties: Information about the expected number 
of lines, words, or characters in a field. 

• Lexicon: A reference to any available database or lexicon to validate recognition. 

• Template: Certain fields may conform to common templates (e.g. dates, email). 

• Reliability: Information about the importance of reliability in the recognition 
process (e.g. monetary values require high reliability). 



Template 




Name Address Date Number 




For-Sur Sur-For Postal Web Email US Euro Tel Money Credit 

Card 




Irish International Pounds Dollars 



Fig. 2. The template property tree 

Many of these properties have an associated vocabulary tree. For example. 
Figure 2 displays the tree for the template property, which indicates a wide range of 
standard templates that can be used to aid the overall recognition of a field value 
(name, address, date and number formats are shown). 



3.2 Component Knowledge 

In order for the Scribble configuration sub-system to select the appropriate HR 
components for a given form field, it must be possible to associate form field 
properties with component properties. For this reason, a component mark-up language 
(CML) for HR components has been devised based on a detailed study of a wide 
range of HR algorithms covering all of the main HR processing stages (see [4]). The 
following properties have been determined: 

• Type: The type of component (e.g., pre-processing, segmentation, etc.) 

• I/O: The input/output properties of the component. For example, a segmentation 
component may take in a field image as input and output a number of text-line 
images (see Figure 3). 




516 Cara O’Boyle et al. 



• Pre-Processing: Indicates whether the component in question requires any pre- 
processing to have been performed. For example, some segmentation algorithms 
are sensitive to slant and so slant correction must be performed prior to 
segmentation. 

• Post-Processing: Possible post-processing stages for the component in question. 

• Feature: Indicates secondary input information for the component such as 
additional features. 

• Category/Text: Similar meaning to the category and text properties in the FML. 

• Lexicon: Indicates whether the component requires a lexicon. For example, a 
holistic word recogniser requires a lexicon to recognise word images. 

• Cost: The computational cost of the component. 

• Reliability: The expected reliability of the algorithm. 




Fig. 3. Line segmentation eomponent: illustrated funetion and CML deseription. 



For example let us consider a simple HR component to segment a field image into 
text line images (see Figure 3). Very briefly, this algorithm identifies line 
segmentation points from the local minima of a horizontal density histogram over the 
image (a horizontal density histogram is computed by counting the number of black 
pixels in each row of the image). Obviously this is a segmentation algorithm (type) 
and it transforms a single bitmap field image (input) into a number of line images 
(output). The algorithm is slope sensitive and should be used with a slope correction 
component (pre-processing). The CML description for this component is given in 
Figure 3. 



3.3 Form Designer and Domain Knowledge Acquisition Tool 

The Scribble Form Designer tool serves two purposes, it allows the user to 
graphically design a form and it also provides an interface for acquiring domain 
knowledge, principally form field knowledge, but also HR component knowledge. A 
wide range of forms can be designed using the graphical tools provided. Text, 
graphics, and various types of form fields can be interactively placed on each form. 

The tool also assists in the capture of form knowledge by incrementally compiling 
an FML description of the evolving form as new form fields are defined and 



An Automatic Configuration System for Handwriting Recognition Problems 517 



positioned. Part of the FML description is constructed automatically (e.g. field layout 
properties) as the form is built-up on screen. Additional properties can be added 
manually during the form design (see Figure 4). 

This tool also provides an interface for a handwriting recognition expert to 
manually describe the components in the HR component library using CML 
descriptions. The handwriting recognition expert can manually create and edit 
component descriptions using the pop-up box and menus provided. 




Fig. 4. Screen shot of the Scribble Form Designer 



The output of the Form Designer tool includes the graphical (printed) form and two 
text files, the Form Description File and the Component Description File. The form 
description file contains a listing of all the fields in the form for recognition together 
with their FML descriptions. The Component Description File contains a description 
of all the handwriting recognition components in the Component Library. Both these 
files are read in by the Configurer and used to guide the configuration process. 



4. The Configurer 

The Configurer is a rule-based expert system. Its inputs include the form description 
file (given in FML) and the component description file (given in CML), which are 
used to initialise the expert system’s fact base. The main purpose of the Configurer is 
to use these facts and its rule-base to configure a tailored recognition system for each 
field of the form by selecting the appropriate sequence of HR components to handle 
the pre-processing, feature extraction, segmentation, post-processing, and recognition 
requirements of each form field. 



518 Cara O’Boyle et al. 



THE CONFIGURER 



FML 

Form field 
descriptions 




CML 

Component 

descriptions 



Choose template 


^ Yes. 


system 





For chosen component: 

choose pre-processors components 
choose feature components 
choose post-processor components 



Choose 




Get field 


reeogniser 




input type 




Get system graph 

i 



OUTPUT 




Separate HR configuration 
for each field of the form 



Fig. 5. The Configurer 

The basic operation of the Configurer, for the processing of a single form field, is 
outlined in Figure 5. The Configurer starts by checking the component library for a 
template that matches the given field. Templates are purpose built macro-components 
for handling certain popular types of form fields such as dates or email addresses for 
example. If a template is available then the field can be handled directly. 

Usually a given field cannot be solved by a template component. In this case, the 
first job of the Configurer is to select a suitable recognition component using the 
recogniser rules. For example. Figure 6 shows one such rule for recognising fields 
that contain numeric, cursive data using a numeric character recognition component. 
Basically the rule indicates that if a recognition component has not already been 
selected for this field, the field which is being recognised is described as being 
numeric and cursive, and the FML description does not indicate that it conforms to a 
particular template, then a recognition component which recognises numeric, cursive 
characters should be chosen to recognise this field. 

Once a recogniser has been selected the next job is to determine whether a 
segmentor is required, and if so, to select a suitable segmentor. Very briefly, if the 
required input of the selected recogniser does not match the field input type (e.g. if the 
recogniser is a character recogniser but the field type is a word field) then a segmentor 
is necessary (e.g. to segment the word image into character images). 

Each HR component in the component library will have a number of feature 
extraction, pre-processing and post-processing components associated with it as part 
of the component description - essentially these associated components act as 




An Automatic Configuration System for Handwriting Recognition Problems 519 



preconditions or post-conditions of the component in question. Each time a 
component is chosen, additional feature extraction, pre-processing, and post- 
processing components must be chosen to fulfil these conditions. For example, the 
segmentation component in Figure 3 requires a horizontal density histogram as 
feature input. The component is sensitive to slope variations and so also requires a 
slope correction component to be used first. This component has no post-processing 
requirements. 



Recogniser-Rule#12 

IF 

;if a recogniser has not been chosen for this field 
(not (required-algorithm 

(algorithm-category recogniser) (formfield ?name) ) ) 

;the field category=numeric , field text - type = cursive 
(formfield (category numeric) (text-type cursive) 

(name ?name) (template none) ) 

THEN 

/assert a fact in memory indicating that a component 
/matching these component properties is required, 
(assert (required-algorithm 

(algorithm-category recogniser) (category numeric) 
(text-type cursive) (formfield ?name) 

(algorithm- type recognition-character) ) ) 



Fig. 6. An example rule used by the Configurer for selecting the recognition component 

The Configurer is not just responsible for selecting a set of HR components for a 
given form field. As part of the configuration process, the relative ordering of 
components must also be determined. The Configurer establishes ordering constraints 
in two ways. First, the order in which configuration rules fire provide initial ordering 
constraints. Additional ordering constraints are then introduced by additional rules. 
For example, suppose a component has a pre-processing constraint that both a slant 
correction and size normalisation component should be used. A slant correction 
component usually alters the size of the image and so should be used prior to a size 
normalisation component or it will essentially undo the work of the normaliser. 



5. Component Graphs 

The output of the configuration process is a component graph that specifies the 
ordering relations between all of the selected components for handling the target field. 
These graphs can be converted directly into executable code. The following is an 
example of the configuration result for a single field of a form. The field is a home 
telephone number field and consists of a single line of cursive, numeric text. The field 
description and the resulting component graph is presented in Figure 7. 





520 Cara O’Boyle et al. 



The Configurer selects 7 components in total to recognise the field. Looking at the 
recognition and segmentation requirements first we see the following components 
selected. 

1. It chooses a digit recogniser component (which was developed by Pami [5]) to 
recognise touching digits. 

2. Next it compares the field input type (line-image) to the input type of the 
recogniser component (character-image). As the inputs are not equal it firstly 
selects a touching digit segmentor component (also described in [5]) to segment the 
image from a word image to character images. 

3. Secondly a word to character segmentor (based on run-length distance between 
connected components as described by Mahadevan & Nagabushnam [3]) is 
selected to segment the line image to word images. 

Next, the Configurer checks the pre-processor, feature-extraction and post- 
processing requirements of each of the above components to select the following 
additional components. 

4. The recogniser requires a feature extraction component that extracts directional 
codes from the image (described in [5]). 

5. The same directional codes component also serves the digit segmentor. 

6. This digit segmentor also requires pre-processing of the image in the form of a 
smoothing algorithm (described in [8]). 

7. Finally, the line to word segmentation algorithm uses connected component (group 
of touching pixels) as an input feature and so a connected component algorithm 
(described in [9]) is chosen. 



Home Telephone: 



FML Description: 

(formfield (name Text63) (category numeric) (text-type cursive) (line 1) 
(word 0) (character 0) (lexicon none) (template none) (reliability 2) 
(left 0 . 9982364) (top 21 . 99647) (width 6 . 994709 ) (height 0.6878307)) 



Wang92 
y smoothing y 



Pami82 
, directional | 
codes 



Pami82- 
I directional- ) 
codes 



[ connected \ 


run- length-) 


^ / Parui82-\ 


/ Parui82-\ 


1 component J 


1 segmentor 1 


H digit 1“ 

\segmentoy 


H digit 1 

Xrecognise^ 


7 


3 


2 


1 



Fig. 7. Example component configuration for a field 



An Automatic Configuration System for Handwriting Recognition Problems 521 



6. Conclusions 

In this paper, we have given an overview of the Scribble system, which takes a novel 
approach to handwriting recognition by automating the configuration of domain- 
specific HR systems through the use of form and HR component knowledge. We have 
focussed on Scribble’s representation issues (specifically the representation of form 
and component knowledge) and its configuration process. 

We believe that Scribble’s approach to integrating form design and configuration 
offers advantages over many existing strategies for developing HR systems. 
Specifically, it builds task specific HR systems for individual forms (and indeed form 
fields) rather than generic systems. We believe that these task specific systems will 
benefit from improved performance characteristics when compared to generic systems 
(as has been shown in the past by other HR research). Work is currently ongoing to 
fully evaluate the performance characteristics of the Scribble system and of the HR 
systems that it generates. 



References 

1. Dzuba, G., Filatov, A., Gershuny, D., Kil, I, Nikitin, V.: Check amount recognition 
based on the cross validation of courtesy and legal amount fields. Int. Journal of 
Pattern Recognition and A.I., vol 1 1 no. 4, pp 639-655, 1997. 

2. Lam, L., Suen, C.Y., Guillevic, D., Strathy, N.W., Cheriet, M., Liu, K., Said J.N.: 
Automatic Processing of Information on Cheques. Int., Conf, on Systems, Man & 
Cybernetics, Vancouver, Canada, pp. 2353 - 2358, 1995. 

3. Mahadevan, U., Srihari, S. N.: Hypothesis Generation for Word Separation in 
Handwritten Lines. Progress in Handwriting Recognition, proceedings of the 
IWFHR5, pp. 515 - 518, 1996. 

4. O'Boyle, C., Smyth, B., and Geiselbrechtinger, F.: Scribble: Configuring Hand- 
Writing Recognition Systems from Form Knowledge. Proceedings of the 10th Irish 
Conference on Artificial Intelligence and Cognitive Science, pp. 217-222, Cork, 
Ireland, September 1999. 

5. Pami, S. K., Chaudhuri, B. B., Majumder, D. D.: A Procedure for Recognition of 
Connected Handwritten Numerals. Int. J. Systems Sci., vol. 13, no. 9, pp 1019 - 
1029, 1982 . 

6. Simoncini, L., Kovacs- V, Zs. M.: A System for reading USA Census ‘90 Hand- 
written Fields. D.E.I.S., Int. Conf on Document Analysis and Recognition, pp. 82 - 
85, 1995. 

7. Srihari, S. N, Shin, Y. C., Ramanaprasad, V., Lee, D. S.: Name and Address Block 
Reader. Proceedings of IEEE, vol. 84 (7), 1996, pp. 1038-1049. 

8. Wang, P.S.P, Nagendraprasad, M.V., Gupta, A.: A neural net based hybrid 
approach to handwritten numeral recognition. From Pixels to features III: Frontiers 
in Handwriting Recognition, (IWFHR '92), Elsevier Science Publishers, 1992. 

9. Yanikoglu, Berrin A., Sandon, Peter A.: Off-line Cursive Handwriting Recognition 
Using Style Parameters. Technical Report, Department of Mathematics and 
Computer Science, Dartmouth College, Hanover, NH, 03755, June 7, 1993. 




Detection of Circular Object with a High Speed 

Algorithm 



Adel A. Sewisy 



Department of Mathematics (Computer Science), Faculity of Science 
Assiut Univ. Assiut Egypt 
sewisyOacc . aun . eun . eg 



Abstract. The Hough Transform HT can detect straight lines in an 
edge - enhanced picture, however, the extension of the HT to recover 
circles has been limited by low speed and large storage. The suggested 
technique can overcome the major disadvantage for the HT. This tech- 
nique has advantages:- (1) high speed research for parallel points, which 
formed the parallel lines. (2) During its operation, it is based on the par- 
allel lines in each object for giving results. (3) The ability to deal with 
every object in an image separatly ond output of results of the end of 
each object immediatly. (4) From (1) and (2) it is conculed that a high 
speed in operation in comparison with [4,5] and [3]. 

Keywords: Computer Vision, Image proccessing. Hough Transform, cir- 
cles detection, accumulator array. 



1 Introduction 

One of the basic tasks in computer vision is the detection of straight lines, circles, 
ellipses, etc. from an image. The HT and its variants [2] are methods commonly 
used for curve detection. The implementation HT transforms each point in an 
image into a parameterized curve of the parameter space; the parameter space 
is usually represented by an array of accumulators in which each accumulator 
corresponds to a specific instance of the model in image space; Finally selec- 
tion of accumulator with the local maximum and its parameter coordinates are 
used to represent an instance of the model in image space. Direct application 
of HT for detecting ellipses is not practical, because an ellipse is determined by 
five parameters (xo^yo^a^b^O), where (xo,^o) are the coordinates of the ellipse 
center, a and b are the half lengths of the major and minor axes, and 0 is the 
rotation angle. Thus the computation is very expensive both in storage and in 
time. Many methods have been proposed for treating this problem e.g. In [4], a 
pair of edge pixels was used for the detection of ellipses. Also, in [1] it was shown 
how to use the HT for the deteection of general arbitrary shapes with any scale 
and any orientation. The detection of circles and ellipses using a 2 - dimensional 
array was proposed in [3]. In [5], it was proposed a better centre used to a ge- 
ometric property and then estimated the other three parameters based on the 
Adaptive HT. In this paper, we propose a new technique for detecting circles. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 522-534, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



Detection of Circular Object with a High Speed Algorithm 523 



This technique is based on the parallel lines to detct the circles. The remainder 
of this paper is organized as follows: Section 2 describes image Space-parameter 
space transformation. Section 3 describes the proposed technique. Section 4 gives 
computer experiments and section 5 summary and conclusions. 

2 Image Space - Parameter Space Transformation 

2.1 Circle 

Given any edge pixel yi) and radius r a circle can be created in parameter 

space i.e 

(xo - Xif + (yo - ViY = 

and written as 

xq = Xi ^ r cos 0^ (1) 

yo = yi + rsmO, (2) 

where 6 may vary from 0 to 27t. For a group of figure points which lie on the same 
circle, a set of circles in the parameter space will be generated. In parameter 
space, the circles which correspond to the figure points will have a common 
intersection point of which the coordinate indicates the existing circle in image 
space. To reduce the computation effort, we introduce a property for a circle. 



Property for A Circle Let I denote the binary image and P(x, y) represent 
the pixel. Given P{xi^y\) lies on a circle with radius r, the gradient normal at 
P{xi^yi) will point to the centroid of the circle. 

Proof: 

The equation of a circle 



{xq - XiY + {yQ - Vif = , (3) 

For any edge pixel P{xi^yi) on the circle, the gradient of the curve segment at 
P(x, y) can be derived by letting 



X = X — xo, 
y = y-yo^ 



the equation of a circle can be rewritten as 



+ ^^2 



Differentiating Y with respect to X we have 



dY _ X 
dX ~ ~Y' 



524 Adel A. Sewisy 



Therefore the slope at {X ^ , Yi ) is 



dYi _ Xi 

“ yT’ 



( 4 ) 



the slope of the line connecting the centroid and the pixel (Xi,Yi) is 



dV2 Vi 
dX2 Xi ■ 



( 5 ) 



From equation (4) and (5) we detect the gradient of the curve is normal to line 
connecting the centroid and the pixel P(Xi,Yi) see Fig. 1. 




Fig. 1. The directions MqT and MqM are conjugate 



Property 1 The set of all lines having the same p in the XT-plane are mapped 
onto a line p = const in the (p, 0) - plane. From this we notice the set of all lines 
having the same p envelopes a circle with centre at the origin and radius equals 
to p. 

Proof. 

The equation of circle at the origin in image given by 



2,2 2 
X y = a , 

then 

y = ±\/a2 — 

and 



^ - I ^ 

dx y/a‘^ — 



Hence Oq and po are given by 



cot Oq 



= ±- 






r*2 



and 



Po = ± 



sin(9o = ±a, 



( 6 ) 



\^a‘^ — x‘^ 



Detection of Circular Object with a High Speed Algorithm 525 



since 



sin^o = 



\/ o? — : 



When the circular object is not centered (origin) i.e equation of circle 



{x - Xof + {y- yof = > 



then 

and 



y = yo± Va2 - (x - xq)^ 



dy (x- xo) 



dx — {x — 

Hence and po are given by 

{x - Xo) 



cot ^0 = T 






and 



Po 



= [yo±^/a? -{x-xoY±x , ] sin6»o, 

y — [X — Xo) 



where 



sin 6^0 = 



- (x - xq )^ 



Hence 



ryoVci‘^ - (x - Xo) ± - (x - Xo)^ ± x(x - Xo)n 

^0 = L J- 



( 7 ) 



From parameter equations of circle x — xo = a cos t , ^ — ^o = sin t. Then 

po = xcost + psint. 



Property 2 Let C arc be a continuously differentiable, then the set of tangents 
to C is connected. 

Proof. 

Suppose that the C arc have parametric equations 

X = x{t) , y = y{t), (8) 

therefore, the slope of the tangent to the C arc at point (x, y) can be obtained 
from derivatives equations (8) with respect to t we get 



526 Adel A. Sewisy 



then 



where, the 
p = X cos 0 
where 

and 



dy . -uV\ 

—— = tan ( — ) = a, 
ax X 

Hough parameters of this tangent are 0 = a — ^ and 

• /I / ^ \ \ 

+ y sin 6 = X cos((a — — j + ^ sin(a — —) = x sin a — y cos (a, 



sin a = 



(I) 






cos a = 



+ (I)" 



(9) 



by substitutting on sina and cosa in equation (9), we get 

xy — yx 
^ yi;2 +y2’ 

since x, x and y are continuous functions of t, thus are 0 and p, so the set 
of tangents to C map into an arc in Hough space. 



Property 3 If (7 is an arc in the Hough space, then the tangents to C arc are 
a connected C arc in the Hough space. 

Proof. 

Let C arc in the Hough space, this C arc can be considered to be composed 
of a family of tangents (straight lines segments). Then the tangents to the C arc 
at (x, y) have hough parameters (^, p), where 0 = a — ^ and 

p = X sin a — y cos a = x cos 9 ^ y sin 0. 

So the set of a family tangents maps into the sinusoidal curve p = x cos 6 + 
y sin 0 and these tangents are connected and are envolped this C arc in the 
Hough space. Note that if C is a closed curve, the set of tangents to C maps 
into a closed curve in Hough space. 



Detection of Circular Object with a High Speed Algorithm 527 



3 Proposed Technique 

The following algorithm is proposed 

Input: An image Q \* Orginal image *\ 

Output: Detials of each circular object of Q \* The centre radius , 

CPU time *\ 

Step 1: Transform Q to E \* Orginal image to edge points *\ 

Step 2: Detect parallel lines for each circular object in Q. 

Step 3: Calculate the midpoints y^) for two parallel lines for each circular 
object in Q \>k detected the centre for a circular object *\ 

Step 4: Extract all information for each circular object in Q \* (xo,yo )5 ra- 
dius, CPU time *\ 

Step 5: continue for another each circular object in Q ? (y/n) \* to deal 
another each circular object in Q. *\ 

if y extract all information for each circular object in Q. 

else 

stop. 

4 Computer Experiments 

Image 1 see Fig. 2(a), (255X255), shows a synthetic binary image includes six 
circles. The extracted edge points of image 1 and two parallel lines of first circle 
are shown in Fig. 2(b). Fig. 2(c) shows the results for parallel lines of first circle in 
Fig. 2(b) to detect the centre {x^^yo)^ radius r and accumulator arrays. Starting 
bulding accumulator arrays for detecting the centre (xo,^o) of first circle are 
shown in Fig. 2(d). Fig. 3(a) shows an image 1. The extracted edge points of 
image 1 are shown in Fig. 2(b). Fig. 3(c) shows the results of first circle in 
Fig. 3(b) for example the centre (xq,^o) are 30 and 28, respectivly, radius is 20 
and the CPU time for detecting first circle in image 1 is 11.2637s. Accumulator 
arrays of first circle are shown in Fig. 3(d). Fig. 4(a) shows an image 1. The edge 
points of image 1 and two parallel lines of second circle are shown in Fig. 4(b). 
Results of parallel lines for second circle in Fig. 4(b) are shown in Fig. 4(c). 
Fig. 4(d) shows steps for plotting accumulator arrays. Fig. 5(a) shows an image 1. 
Fig. 5(b) contains five circles. The results of the second circle in Fig. 5(b) centre 
(a:o,^o) are 215 and 19, respectivly, radius is 10 and the CPU time for detecting 
the second circle is 3.1868. Fig. 5(d) shows accumulator arrays. Fig. 6(a) shows 



528 Adel A. Sewisy 



an image 1. Fig. 6(b) contains four circles. Fig. 6(c) shows the results of the third 
circle in Fig. 6(b) centre y^) are 160 and 69, respectivly, radius is 15 and the 
CPU time for detecting the third circle is 6.53846s. Fig. 6(d) shows accumulator 
arrays. Fig. 7(a) shows an image 1. Fig. 7(b) contains three circles. The results of 
the fourth circle in Fig. 7(b) centre (jjqj^/o) are 100 and 139, respectivly, radius 
is 15 and the CPU time for detecting the fourth circle is 6.7033s are shown 
in Fig. 7(c). Accumulator arrays are shown in Fig. 7(d). Fig. 8(a) shows an 
image 1. Fig. 8(b) contains two circles. Fig. 8(c) shows the results of fifth circle 
in Fig. 8(b) centre (xq,^o) are 190 and 189, respectivly, radius is 15 and the 
CPU time for detecting fifth circle is 6.86813s. Fig. 8(d) shows the accumulator 
arrays. Fig. 9(a) shows an image 1. Fig. 9(b) contains one circle. The results of 
the sixth circle in Fig. 9(b) centre (xo,?/o) are 15 and 219, respectivly, radius 
is 10 and the CPU time for detecting the sixth circle is 3.79121s are shown in 
Fig. 9(c). Accumulator arrays are shown in Fig. 9(d). 

Image 2 see Fig. 10(a), (255X255), shows a bitmap image includes three cir- 
cles and two ellipses. Fig. 10(b) shows the edge points of image 2. The results of 
first circle in Fig. 10(b) centre (xq, y^) are 38 and 29, respectivly, radius is 13 and 
the CPU time for detecting the first circle is 6.4835s are shown in Fig. 10(c). Ac- 
cumulator arrays are shown in Fig. 10(d). Fig. 11(a) shows an image 2. Fig. 11(b) 
contains two circles and two ellipses. Fig. 11(c) shows the results of second circle 
in Fig. 11(b) centre (xq,^o) are 236 and 29, respectivly, radius is 13 and the 
CPU time for detecting the second circle is 6.4832s. Fig. 11(d) shows the accu- 
mulator arrays. Fig. 12(a) shows an image 2. Fig. 12(b) includes one circle and 
two ellipses. The results of first ellipse in Fig. 12(b) centre (xo,^o) are 96 and 
77, respectivly, the major and minor radii are 43 and 14, respectivly, and the 
CPU time for detecting the first ellipse is 15.8242s are shown in Fig. 12(c). Ac- 
cumulator arrays are shown in Fig. 9(d). Fig. 13(a) shows an image 2. Fig. 13(b) 
contains one circle and one ellipse. Fig. 13(c) shows the results of third circle in 
Fig. 13(b) centre (xo,^o) are 100 and 136, respectivly, radius is 12 and the CPU 
time for detecting the third circle is 5.43956s. Fig. 13(d) shows the accumulator 
arrays. Fig. 14(a) shows an image 2. Fig. 14(b) includes one ellipse. The results 
of second ellipse in Fig. 14(b) centre (xo,^o) are 160 and 205, respectivly, the 
major and minor radii are 43 and 19, respectivly, and the CPU time for detect- 
ing the second ellipse is 16.48s are shown in Fig. 14(c). Accumulator arrays are 
shown in Fig. 14(d). 

For comparison purposes, we apply the methods in [4,5] and [3] to system 
Fig. 2(a) and Fig. 10(a) by using a Pentium 200 with C language. Table 1 
summarizes the final results for both these methods and our method. The method 
in [3] fails in this kind of images Fig. 2(a). But Table 1 shows both methods 
in [4,5] and our method are effictive for these images. 

5 Summary and Conclusions 

The proposed technique in this paper has advantages:- (1) high speed research 
for parallel points, which formed the parallel lines. (2) During its operation, it 



Detection of Circular Object with a High Speed Algorithm 529 




Fig. 2. Image 1 see (a), (255X255), shows a synthetic binary image 
includes six circles. The extracted edge points of image 1 and two 
parallel lines of first circle are shown in (b).(c) shows the results for 
parallel lines of first circle in (b) to detect the centre, radius and 
accumulator arrays. Srarting bulding accumalator arrays are shown 
in (d). 




Fig. 3. (a) shows an image 1. The extracted edge points of image 1 are 
shown in (b). (c) shows the results of first circle in (b) for example the 
centre (xq, ^o) are 30 and 28, respectivly, radius is 20 and the CPU time 
for detecting this circle in image 1 is 11.26375. Accumulator arrays of 
first circle are shown in (d). 




Fig. 4. (a) shows an image 1. The edge points of image 1 and two 
parallel lines of second circle are shown in (b). Results of parallel 
lines for second circle in (b) are shown in Fig. 4(c). (d) shows steps 
for plotting accumulator arrays. 



530 Adel A. Sewisy 



• 


o 


• 


O 


• 


O 


• 


O 


• 


o 







Fig. 5. (a) shows an image 1. (b) contains five circles. The results of 
the second circle in (c) centre (xo,^o) are 215 and 19, respectivly, radius 
is 10 and the CPU time for detecting this circle is 3.186815. (d) shows 
accumulator arrays. 



• 




• 


O 


• 


O 


• 


O 


• 


o 




A/"' 



Fig. 6. (a) shows an image 1. (b) contains four circles, (c) shows the re- 
sults of the third circle in (b) centre (xo,yo) are 160 and 69, respectivly, 
radius is 15 and the CPU time for detecting of this circle is 6.53846s. 
(d) shows accumulator arrays. 



• 




• 




• 


O 


• 


o 


• 


o 







Fig. 7. (a) shows an image 1. (b) contains three circles. The results 
of the fourth circle in (b) centre (xo,^o) are 100 and 139, respectivly, 
radius is 15 and the CPU time for detecting of this circle is 6.7033s are 
shown in (c). Accumulator arrays are shown in (d). 



Detection of Circular Object with a High Speed Algorithm 531 




Fig. 8. (a) shows an image 1. (b) contains two circles, (c) shows the 
results of fifth circle in (b) centre (xo,^o) are 190 and 189, respectivly, 
radius is 15 and the CPU time for detecting fifth circle is 6.868135. (d) 
shows the accumulator arrays. 




Fig. 9. (a) shows an image 1. (b) contains one circle. The results of the 
sixth circle in (b) centre (xo,^o) are 15 and 219, respectivly, radius is 
10 and the CPU time for detecting sixth circle is 3.791215 are shown 
(c). Accumulator arrays are shown in (d). 



• • 
• 


o o 

Q 
o 







Fig. 10. Image 2 see (a), (255X255), shows a bitmap image includes 
three circles and two ellipses, (b) shows the edge points of image 2. 
The results of first circle in (b) centre (xq, ^o) are 38 and 29, respectivly, 
radius is 13 and the CPU time for detecting first circle is 6.483525 are 
shown in (c). Accumulator arrays are shown in (d). 



532 Adel A. Sewisy 



• • 
• 


o 

o 







Fig. 11. (a) shows an image 2. (b) contains two circles and two ellipses, 
(c) shows the results of second circle in (b) centre (xo,^o) are 236 and 
29, respectivly, radius is 13 and the CPU time for detecting second 
circle is 6.48352s. (d) shows the accumulator arrays. 



• • 
• 


o 

o 

C3 







Fig. 12. (a) shows an image 2. (b) includes first circle and two ellipses. 
The results of first ellipse in (b) centre (a:o,yo) are 96 and 77, respec- 
tivly, the major and minor radii are 43 and 19, respectivly, and the 
CPU time for detecting first ellipse is 15.8242s are shown in Fig. 12(c). 
Accumulator arrays are shown in (d). 



• • 
• 


o 







Fig. 13. (a) shows an image 2. (b) contains firstcircle and first ellipse, 
(c) shows the results of third circle in (b) centre (xo,^o) are 100 and 
136, respectivly, radius is 12 and the CPU time for detecting third 
circle is 5.43956s. (d) shows the accumulator arrays. 



Detection of Circular Object with a High Speed Algorithm 533 




Fig. 14. (a) shows an image 2. (b) includes one ellipse. The results 
of second ellipse in (b) centre (xq,?/o) are 160 and 205, respectivly, the 
major and minor radii are 43 and 19, respectivly, and the CPU time for 
detecting the second ellipse is 16.4835s are shown in (c). Accumulator 
arrays are shown in (d). 



Table 1. The final result for author’s methods and our method 



Author 

Interest’s 


Tsujj and 
Matsumoto [5] 


Yuen et. 
al’s [6] 


Yip et. 
al’s [3] 


Our 

method 


The hnal result 
of image 1 


yes 


yes 


no 


yes 


The final result 
of image 2 


yes 


yes 


no 


yes 



is based on the parallel lines in each object for giving results. (3) The ability to 
deal with every object in an image separatly ond output of results of the end of 
each object immediatly. (4) From (1) and (2) it is conculed that a high speed 
in operation in comparison with [4,5] and [3]. Computer experiments show that 
the proposed technique is effective and robust. Finally, from Table 1, it has been 
shown that method [3] will be totally ineffective in more complicated images 1 
and 2. Thus our method is more efficient than this method also it is very faster 
than methods [4,5] and [3]. Because this technique is based on parallel lines 
for detecting circular object. Moreover, Yip et al.’s [3] method involves many 
operations of trigonometric functions such as cos^, sin^ and tan“^ 0 . 



References 

1. D. H Ballard. Generalizing the hough transform to detect arbitrary shapes. P.R., 
13(2):111 122, 1981. 522 

2. J. Illingworth and J. Kittler. A survey of the hough transform. CVGIP, 44:87-116, 
1988. 522 

3. K. S. Tam Peter K. K. Yip Raymond and N.K. Leung Dennis. Modification of 
hough transform for circles and ellipses detection u sing a 2 -dimensional array. 
P.R., 25:1007 - 1022, 1992. 522, 528, 533 



534 Adel A. Sewisy 



4. S. Tsuii and F. Matsumoto. Detection of ellipses by modified hough transform. 
IEEE T - Comp, 27:777 - 781, 1978. 522, 528, 533 

5. J. Illingworth K. Yuen and J. Kittler. Detecting partially ellipses using the hough 

transform. Image and Vision Comuting, 7(1) :31 - 37, February 1989. 522, 528, 

533 



Neural Network Based Compensation of 
Micromachined Accelerometers for Static and Low 
Frequency Applications 



Elena Gaura, Richard Rider, and Nigel Steele 

Coventry University, Priory Street, Coventry, UK 
E . Gaura@coventry .ac.uk 



Abstract. In this work, a single-shot direet inverse eompensation proeedure 
based on neural networks is proposed, with applieation to mieromaehined 
aeeelerometers. Compensation was first eonsidered from an empirieal viewpoint 
to determine whether or not some kind of relationship exists between the 
severity of different nonlinearities and the eomplexity of the network required 
to eontrol sueh nonlinearities. The proeedure was then validated by applying 
direet inverse eontrol to the measured statie eharaeteristie of a mieromaehined 
aeeeleration sensing element. 



1. Introduction 

Mieromaehined accelerometers have been subject to extensive research over the last 
decade [1, 2, 3]. These sensors are already found in commercial products, especially 
in the automotive industry [4]. However they are mainly used for applications in 
which only low precision and performance are required [3, 4]. In the near future it is 
expected that micromachined sensors will be used for more challenging applications 
both in specialised areas and also in mass consumer areas [2]. These markets require 
reliable accelerometers of low cost and small size, attributes which can be achieved by 
manufacturing the sensors in silicon [3]. 

In spite of the advances in micromachining, no sensor possesses ideal behaviour 
and silicon sensors are no exception [2]. These devices, not only exhibit 
imperfections, such as offset, drift, non-linearity, noise, etc., but also the magnitude of 
these imperfections can vary [6]. Compensation of time-variant ambient effects 
requires continuous monitoring of these effects and on-line correction of the sensor 
behaviour. On the other hand, time-invariant departures from ideal behaviour can be 
corrected using single shot compensation procedures. Both procedures may require 
extra hardware and software and should therefore be considered during the design 
phase of the sensor system in order to minimise overall costs [2]. 

In this work, a single-shot compensation procedure based on neural networks is 
proposed, with application to micromachined accelerometers. The procedure aims to 
compensate time-invariant departures from ideal behaviour of such systems. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 534-542, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




Micromachined Accelerometers for Statie and Low Frequeney Applieations 535 



2. Neural Network Based Direct Inverse Control 

Artificial neural networks can be used as a representation framework for modelling 
and controlling nonlinear dynamical systems [5, 6, 7]. In the literature on neural 
network architectures for control, a large number of control structures have been 
proposed and used [5, 7, 8], one of the simplest being direct inverse control which 
utilises an inverse system model. If the model of the sensor is invertible, then the 
inverse of the sensor model can be approximated. This inverse model is simply 
cascaded with the controlled system in order that the aggregated system results in an 
identity mapping between the desired response (i.e. the network input) and the 
controlled system output. Thus, the network acts directly as a controller in such 
configurations [5]. 

The transducer design proposed here, (based on the above technique) addresses 
three types of manufacturing problems associated with micromachined sensors having 
a capacitive type of pick-off It has been shown, both by mathematical modelling and 
by measurements that these devices are inherently nonlinear [1,6]. Typical nonlinear 
effects which must be addressed are the offset of the seismic mass from the central 
position between the plates, a piece-wise linear input-output characteristic and 
squeeze film damping (this results in a hysteresis type, dynamic nonlinearity). Thus, in 
order to provide an overall linear measurement, the nonlinear characteristic of the 
sensor needs to be compensated. The compensation is performed here using an 
artificial neural networks (ANN) approach. Three types of networks were used, all 
having a basic multi-layer perceptron (MLP) structure. The static nonlinearities such 
as piece-wise linear and offset were compensated using a static MLP with a variable 
number of hidden nodes/layers. The dynamic nonlinearity problem was initially 
tackled using a tap-delayed lines (TDL) MLP network. Due to practical 
implementation considerations, a second approach using a novel type of ‘flag’ 
dynamic network was further attempted. All networks were trained using an improved 
form of the error-back-propagation (BKP) algorithm which included a form of 
dynamic learning management [9]. Compensation was first considered from an 
empirical viewpoint to determine whether or not some kind of relationship exists 
between the severity of different nonlinearities and the complexity of the network 
required to control such nonlinearities. The results of this study have direct 
implications in the hardware design of the neural transducer. The direct inverse 
compensation procedure was then validated by applying direct inverse control to the 
measured static characteristic of a micromachined acceleration sensing element. 



3. Nonlinear Characteristics Compensation 

Figures 1, 2, and 3 show three types of input-output transfer functions commonly 
found in micromachined sensors. In general, a combination of these effects will be 
present within a particular sensor characteristic. 




536 Elena Gaura et al. 




Fig. 1. Piece-wise linear input-output 
transfer function 




Fig. 2. Input-output transfer function due 
to offset of the seismic mass 




With reference to Figure 1, when 0 = 45°, the characteristic is linear over the whole 
dynamic range of the sensor and compensation is therefore unnecessary. When 0 = 0°, 
the sensor saturates and compensation becomes impossible. For intermediate values 
of0, compensation is possible, but the ability to design a suitable neural network 
architecture may be expected to become increasingly more difficult as 0 decreases. 

The severity of nonlinearity for the second characteristic (offset of the seismic 
mass. Figure 2) was quantified by an index calculated as the area enclosed between 
the Y axis, the sensor characteristic and the ideal response sought by the compensating 
procedure (full line in Figure 2). Note that by cascading the sensor with its inverse 
model, the identity mapping will be produced. 




Micromachined Accelerometers for Statie and Low Frequeney Applieations 537 



For the characteristics in Figure 3, the area of each hysteresis loop was chosen as a 
measure of the severity of nonlinearity. Because of the time history dependency 
present in this type of nonlinearity, initially, a tap-delayed line network was chosen to 
model the inverse of the sensor characteristic. A MATLAB program was designed to 
generate the training sets for such compensating networks, allowing any number of 
delayed inputs to be specified. After some preliminary investigations, it was decided 
that 2-input networks (the current and 1-unit delayed input signals) should be suitable 
for this particular application. 

The compensation task selected for all these nonlinearities was to reconstruct a 
sine wave of fixed frequency after it had been distorted by a nonlinearity. The sine 
wave was scaled to have an amplitude of just less than 1 in order to facilitate 
reconstruction using a neural network containing sigmoid activation functions. The 
training sets were constructed by uniformly sampling both the distorted and the 
original signals at 207 points/period, over a single period of the sine wave. These 
values were used directly as the network input and output respectively for the cases 
presented in Figures 1 and 2 and further processed to determine the one-step-back 
delayed input signal for the case in Figure 3. The test sets also contained 207 points, 
previously unseen by the networks. A dynamic training algorithm was designed which 
included both a variable learning rate and a momentum term [9]. 

In order to determine the minimum network configuration necessary for calibrating 
each nonlinearity, an automated training program was designed to test predefined sets 
of network architectures. Successful training was defined by a SSE of 0.05 over 207 
samples. The program has proved of benefit since it has been able to test a very large 
variety of configurations without the need for human intervention and as a result, 
successful configurations have been found that would have only been chanced upon 
using manual restarting of the training process. The program was run for 6 degrees of 
nonlinearity for each of the characteristic types presented in Figures 1, 2, and 3. 
Figures 4, 5 and 6 show the minimum trainable network configurations found for each 
nonlinearity tested by the program. The network configuration is defined here as the 
number of synaptic weights in a network. 



Network Configuration 




Theta / deg 

Fig. 4. Minimum trainable network configuration for calibrating a piece-wise linear 

sensor characteristic 





538 Elena Gaura et al. 



NetvwDrk Configuration 




Nonlinearity Index (Area) 



Fig. 5. Minimum trainable network eonfigurations for ealibrating offset in the sensor 

eharaeteristie 



Network Configuration 




Loop Area 



Fig. 6. Minimum trainable network eonfigurations for ealibrating hysteresis in the sensor 

eharaeteristie 

It should be noted that the dashed lines are merely illustrative and should not be 
interpreted as precisely defining the states between the experimental data points. 
However, in order to confirm the results obtained, validation tests were made at 
intermediate points for each type of nonlinearity and networks of corresponding 
configurations (read from the graphs) were successfully trained. 

The graph in Figure 4 shows that for low values of 0, a small decrease in angle will 
lead to a large increase in the network complexity. On the other hand, for large values 
of 0, a small change in angle requires little or no change in network complexity. The 
inherent nonlinearity of the sensor and the range of manufacturing tolerances will 
therefore have a significant influence on the required complexity. 

In the graph in Figure 5, for a characteristic slope of 0.5, despite the increasing 
offset, the compensating network complexity did not increase. Decreasing the slope on 
the other hand, caused an abrupt increase in the network complexity. 






Micromachined Accelerometers for Statie and Low Frequeney Applieations 539 



These results are consistent with those obtained for the piece-wise linear 
characteristic compensation where a decrease in the characteristic slope corresponds 
to a decrease in the angle 0 (Figure 4) and an increase in the network complexity. 

The graph in Figure 6 indicates that as the sensor departs from linear behavior, the 
required compensating neural network configuration changes much more rapidly for 
hysteresis than piece-wise linear transfer characteristics. Hence with a fixed hardware 
control configuration, only small variations in the parameters affecting hysteresis can 
be accepted against large variations permissible in those affecting the piece-wise 
linear characteristic. 

In the case of manufactured devices, the span of manufacturing tolerances and thus 
the span of nonlinearities would be known. Such graphs could therefore be used to 
select appropriate architectures without either over or under specifying the capacity of 
the hardware associated with neural network implementations. 



4. Compensation of Sensor Based on Measurements 

In order to characterise the behaviour of the micromachined sensing element 
considered in this work, static measurements were undertaken by the authors by 
mounting the sensing element on a dividing head and rotating it at a frequency of IHz 
in the gravitational field. In this way, accelerations between -g and +g could be 
applied to the sensing element. The dotted line in Figure 7 shows the static 
characteristic of an “articulated” sensing element, with a solid seismic mass. 

Over the acceleration range tested, the sensor characteristic exhibits only offset and 
hysteresis. The offset is due to the offset of the seismic mass from the mid-point 
between the two outer electrodes. At zero input acceleration, the sensor outputs were 
320mV and 400mV, respectively, giving an average offset of 360mV and a hysteresis 
of 80mV. The measured offset due to the pick-off electronics was only 6mV. From the 
measured results for -Ig of 920mV and for +lg of -166mV, the sensitivity of the 
device was calculated as -543mV/g. The offset is therefore equivalent to an 
acceleration of -0.662g and the hysteresis to an acceleration of 0.146g. In the previous 
section, offset and hysteresis were treated separately and individual compensation 
networks were designed for each case. Similarly therefore, for such a sensor, the 
appropriate network sizes may be deduced from the graphs in Figures 5 and 6. The 
resulting networks could then be trained and cascaded with the sensing element. 
However, it should be possible to compensate both offset and hysteresis with a single 
network. 

As for the empirical study presented above, the compensation task was to 
reconstruct a low frequency sine wave after it has been distorted by the sensor. In 
order to be able to validate the compensation procedure, a forward model of the 
sensor was also built, based on the same structure as the inverse one. Both these 
models will now be discussed. 

Due to the presence of hysteresis, initially, a tap-delayed line network was used to 
model the inverse of the sensor characteristic. A 3 -inputs network (the current and 1- 
unit delayed and 2-units delayed signals) was chosen for this particular characteristic 




540 Elena Gaura et al. 



identification/compensation. Figures 8 and 9 show the block diagrams of the training 
schemes used for the two models. 




Input acceleration [g] 

Fig. 7. Measured statie eharaeteristie of the sensor (dotted line); Effeets of direet inverse 
eontrol on sensor behavior - eompensated sensor eharaeteristie (full line) 




Fig. 8. Neural networks training for 
forward modelling 



Fig. 9. Neural networks training for 
inverse modelling 



The training set was based on 103 input-output measurements taken from the 
sensor rotating in the gravitational field. The program designed for the study of 
hysteresis in the previous section was used here to generate the supplementary 
network inputs. The training algorithm and the learning rates and momentum 
parameters remained the same as in the previous examples. 

Two networks were successfully trained to represent the inverse and forward 
models of the sensor. These networks were subsequently connected (cascaded) aiming 
at a -500mV/g sensitivity for the calibrated sensor system. To test the system, a sine 
wave of 9.81V amplitude and IHz frequency was scaled and sampled to provide 103 
test points, approximately mid-way in between the measurement points. The static 
characteristic obtained by cascading a 3x5x1 network (forward model) with a 3x8x1 
network (inverse model) is shown in Figure 7 with a full line. 

It can be noted from this figure that both the sensor offset and hysteresis have been 
compensated. The measurement system functionality has therefore been significantly 








Micromachined Accelerometers for Statie and Low Frequeney Applieations 541 



improved. The success of this procedure encouraged the implementation of the smart 
transducer as an embedded system with the neural processing being supported by an 
Intel 486 microprocessor. 

Embedding the neural network based procedure described above in hardware 
proved to be a more complex task than expected. It was noticed that a high sampling 
frequency of the sensor output is required in order to achieve good accuracy for the 
compensated TDL based system. It has been concluded that a set of 103 only 
measurements over the ±lg range would not be sufficient for designing a TDL-based 
controller network that had good hardware performance. Pursuing the use of TDL 
networks would involve noise training of the networks which in effect means 
modelling the dynamic behaviour of the sensor and its inverse. Although this approach 
would extend the use of the compensated system over the whole dynamic range of the 
‘off-the-shelf sensor, the effort involved in gathering the data sets, training the 
forward and inverse networks and actually implementing in hardware the compensated 
system is not justified for an open-loop transducer design. 

A novel approach was therefore developed to design the type of network able to 
compensate history dependent nonlinearities such as those exhibited by the sensor. 
The networks, for both the direct and inverse models, are of a MLP, with two inputs, a 
single output and two layers of hidden neurons. The novelty of the proposed network 
type consists in using a ‘flag’ in order to describe qualitatively the one-step-back 
history of the signal to be processed by the network, as opposed to the tap-delayed 
approach. Hence, one network input is the current value of the input signal, whilst the 
other is the ‘flag’ whose value depends on the evolution of the input signal. The ‘flag’ 
takes arbitrarily chosen values of 0.99 if the current input is greater than or equal to its 
previous value and -0.99 if it is less. Figures 10 and 1 1 show the block diagrams of the 
training schemes used for the forward and inverse neural models. 




Learning algorithm 



Learning algorithm 



Fig. 10 . ‘Flag’ Neural networks training 
for forward modelling 



Fig. 11 . ‘Flag’ Neural networks training 
for inverse modelling 



In contrast to the TDL approach, the direct inverse control procedure implemented 
with ‘flag’ networks has been successful and uncomplicated for this particular 
problem. For applications where accuracy and linearity is needed over a larger 
dynamic range and at higher frequencies, the use of TDL and noise training would 
probably be required not withstanding the design effort. 









542 Elena Gaura et al. 



5. Conclusions 

The aim of this work was to develop a one-shot procedure for the open-loop 
compensation of micromachined accelerometers based on neural networks. Firstly, an 
empirical study was carried out which revealed relationships exist between the 
severity of a nonlinearity and the size of the network required to correct that 
nonlinearity. Graphs have been produced which enable the hardware requirements for 
supporting the controller network to be deduced, once the sensor characteristic has 
been identified. 

An example of sensor identification and compensation has been considered, based 
on measurements taken from a bulk micromachined acceleration sensor. Over the 
range tested, the sensor characteristic exhibited both offset and hysteresis. A 
compensation procedure based on TDL networks was initially developed and although 
good simulation results have been obtained, the hardware implementation proved to 
be impractical. An alternative, simpler procedure based on a ‘flag’ type network was 
designed. A prototype hardware transducer was produced, based on this latter 
compensation procedure, which performed successfully. 



References 

1. Kraft, M.: "Closed-loop accelerometer employing oversampling conversion", PhD 
Thesis, Coventry University, 1997. 

2. Wise, K.D., “Micromechanical Systems Development in Japan”, Sept. 1994, Internet 
Aug. 1995, http://itri.loyola.edu/mems. 

3. Bao, M., Wang, W., “Future of microelectromechanical systems (MEMS)”, J. 
Sensors and Actuators A 56 (1996), pp. 135-141. 

4. Zimmermann, F. Le Hung, JM. Dujardin, “Surface Mount Accelerometer for High 
Volume Production”, Eurosensors X, Leuven, Belgium, September 1996, 
Proceedings, Vol.5, pp. 1497-1 500. 

5. Irwin, G.W., Warwick, K., Hunt, K.J., “Neural networks applications in control”, 
lEE Control Engineering Series 53, Short Run Press Ltd., UK, 1995. 

6. Gaura, E., Steele, N., Rider, R.J., "A Neural Network Approach for the Identification 
of Micromachined Accelerometers", Second International Conference on Modelling 
and Simulation of Microsystems, MSM’99, Proceedings, pp.245-248, April 1999 
San Juan, Puerto Rico, USA. 

7. Steele, N., Gaura, E., Godjevac, J., "Neural Networks in Control", The 3rd European 
IEEE Workshop on Computer-Intensive Methods in Control and Data Processing, 
Proceedings, pp. 187-199, Prague, Czech Republic, Sept. 1998. 

8. Fabri, S. and Kadirkamanathan, A. “Dynamic Structure Neural Networks for stable 
adaptive control of nonlinear systems”, IEEE Trans, on Neural Networks, vol. 7, no. 
5,pp. 1151-1155, 1996. 

9. Gaura E., Burian A., "A dedicated medium for the synthesis of BKP networks", 
Romanian J. of Biophysics, Vol. 5, No. 15, 1995, Bucharest, Romania. 




Improving Peanut Maturity Prediction Using a Hybrid 
Artificial Neural Network and Fuzzy Inference System 



H.L. Silvio', R.W. McClendon^ and E.W. Tollner^ 

'Artificial Intelligence Center, University of Georgia, Athens, GA 

hsilvio@ai . uga . edu 

^Biological and Agricultural Engineering, University of Georgia, Athens, GA 



Abstract. The goal of this researeh was to improve the predietion of maturity in 
peanuts through the development of an adaptive-network-based fuzzy inferenee 
(ANFIS) model. There were three speeifie objeetives. The first two were to 
develop models for eomparison with previous researeh results using an artifieial 
neural network (ANN) and fiizzy inferenee system (FIS) separately. The third 
objeetive was to expand the researeh by determining the robustness of the 
developed ANFIS model in predieting maturity for a season not used in model 
development. While the hybrid model was able to improve on the results of a 
FIS, the hybrid model was unable to improve on the results of an ANN. The 
developed model was relatively robust. 



1 Introduction 

Accurately predicting peanut maturity is important for protecting quality and 
maximizing yield. Harvesting peanuts outside the optimal time results in premature 
peanuts that are at a higher risk of aflatoxin contamination or overly mature peanuts 
that lack the expected flavor [1]. Harvesting at the optimal maturity also increases the 
shelf life of peanuts [2]. Determining peanut maturity is complex, involving 
knowledge of the planting season, rainfall, and temperature. Williams and Drexler [3] 
developed the hull scrape method to predict peanut maturity. This method is based on 
the known correlation between maturity level and pod mesocarp color, which is 
revealed when the exoscarp is scraped off. Six color classes (numbered one through 
six) of white, light yellow, dark yellow, orange, brown, and black indicate the 
progression of peanut maturity from immature (1) to peak maturity (4 and 5) to over 
mature (6). Several of the problems associated with this method include the 
subjectivity of assessment, the time needed for assessment, and the labor required to 
both move the necessary equipment and conduct the assessment [4]. 

Tollner et al. [4] proposed using nuclear magnetic resonance (NMR) as an 
alternative approach to using the hull scrape method for predicting peanut maturity. 
NMR involves subjecting an object in a magnetic field with a radiofrequency (RF) 
pulse at the appropriate frequency [5]. Depending on the NMR pulse parameters, the 
responses of the object’s atoms, measured on an oscilloscope, will be described as the 
spin-echo or the free induction decay curve [6]. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 543-548, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




544 H.L. Silvio et al. 



Boudolf et al. [7] demonstrated that an artificial neural network (ANN) could be 
trained with NMR data to predict peanut maturity levels. Using field data obtained 
over three years, they developed one- and six-output ANNs to output the six color 
classes used in the hull scrape method. The field data contained 1268 patterns 
consisting of days after planting (DAP), wet weight (WTWGT), and NMR 
measurement data (peak amplitude - FIDPK, inflection point amplitude - FID40, and 
amplitude of the echo pulse - ECHO). The data was partitioned into model 
development and model evaluation datasets. Although model evaluation was 
performed with observations not used in training, data from three seasons were 
included in each dataset. The output of the ANN was used to determine the 
prediction of the maturity class as an integer value from one to six. Shahin et al. [8] 
developed a fuzzy inference system to predict peanut maturity using a subset of the 
data used by Boudolf et al. [7]. This study was limited in that only one season of data 
was used, only two of the available five inputs (DAP and FIDPK) were used, and a 
trial and error method of determining the membership functions was used. As with 
Boudolf et al. [7] model evaluation was not performed with data from a season 
different from the model development data. 

2 Model Development 

Tollner et al. [4] obtained peanuts for the three seasons from 1992 to 1994 from the 
UGA Southwest Branch Station in Plains, GA. Samples were obtained in weekly 
intervals, for periods of four to six weeks, starting each season in August. The plants 
were hand-harvested, bagged, and transported to the UGA Experiment Station in 
Griffin, GA for kernel removal, hull scrape analysis, NMR, gravimetric analysis, and 
moisture analysis. A Southwest Research Model TT-102-SP NMR spectrometer, 
programmed with a Mid Continent MP 5401 pulse programmer and connected to a 
Gould Model 1604 digital storage oscilloscope, was used to obtain the NMR 
measurements. With this equipment, FIDPK, FID40, and ECHO data were obtained 
for each sample. WTWGT and DAP, which complete the data set, were also 
recorded. Tollner et al. [4] provided specific information on the peanut sample 
preparation and NMR procedures. The resulting datasets were used by Boudolf et 
al. [7] and Shahin et al. [8]. 

The Mathworks, Inc. MATLAB® ^ [9] version of ANFIS was used for this research 
and run on a Dell 200 Pentium® with 64 MB RAM. This implementation of ANFIS is 
based on a combined neural network and fuzzy inference system developed by 
Jang [10]. The fuzzy inference system requires at least two inputs, fuzzy rules for 
combining those inputs, and a single output singleton membership function (Sugeno 
style). The neural network component of ANFIS manipulates the parameters of the 
membership functions of the fuzzy inference system input, using the input 
datasets [10]. The data are usually partitioned into training, testing, and validation 
datasets for ANN model development and evaluation. Boudolf et al. [7] partitioned 
each year of data into three datasets. 

Final dimensions for the input membership functions were determined for each 
input variable by the ANN search. After determining the algorithm approach would 



^ Mention of the brand name does not imply endorsement by the authors. 




Improving Peanut Maturity Prediction Using a Hybrid Artificial Neural Network 545 



arrive at the same membership functions regardless of the initial parameters, the rest 
of the experiments were done by starting at the same initial settings and allowing the 
algorithm to configure the parameters. The fuzzy rules considered every possible 
combination of all of the inputs. The single output consisted of singleton output 
membership functions equal to the number of fuzzy rules, which were reflected in the 
six possible maturity class categories. 

Various parameter setting combinations within ANFIS were tried to determine the 
most effective. Backpropogation and hybrid training methods are available. The 
hybrid method is a combination of a backpropogation algorithm with a least squares 
type of method. Keeping all other parameters identical while varying the training 
method determined that the hybrid method reduced the error more. Varying the 
number of epochs in training with different sample data determined that by 90 epochs 
the error rate for the testing dataset began to increase indicating overfitting of the 
training data or convergence to a stable error rate. Given the Sugeno nature of the 
output, constant or linear membership functions were the options. Keeping all other 
parameters identical while varying the membership function choice determined that 
the constant option reduced the error more. After the best method was found, all 
subsequent experiments kept those same parameters. Constant membership functions 
were used and then ANFIS trained using a hybrid optimization method. 

Each input was a crisp numerical value limited to the universe of discourse of that 
variable. Fuzzification of these inputs involved determining the degree of 
membership in different linguistic sets. After fuzzification, the chosen operator, 
AND, was applied to the inputs. In order for a rule to fire, the antecedent of the rule 
must be satisfied. With the chosen operator, this means that all of the inputs must 
succeed and the minimum degree of membership is used. Defuzzification of the 
fuzzy set was accomplished with a centroid calculation, which calculated the center of 
the area and returned a single crisp output value. The real valued output on the 
range 0.0 to 7.0 represented the predicted maturity class. Comparing that output with 
the observed maturity classification provided a means for evaluating the ANFIS 
models developed. 

Three measures of accuracy were used to evaluate the results of the ANFIS 
models. The first measure, root-mean-squared-error (RMSE), was calculated before 
rounding of the real valued output results. Smaller RMSE values indicate higher 
accuracy. The second accuracy measure, percent correct, used the integer values and 
was the percentage of predicted outputs in a dataset that exactly matched the observed 
maturity class. Percent correct was chosen as a form of measurement because it is 
frequently used in evaluating the accuracy of classifiers. The third measure, the chi- 
square statistic, indicated the similarity of the predicted maturity class distribution to 
the observed maturity class distribution. The lower the chi-square value, the more 
closely the distributions match. The chi-square statistic was chosen because peanut 
harvesting decisions are currently based on a distribution of the maturity values from 
a sample rather than the maturity of the individual kernels. Both the percent correct 
and chi-square statistic were determined using the integer classification values and 
have been used in previous research [7] [8], making these numbers necessary for 
comparisons. The RMSE values were used to compare alternative ANFIS 
configurations throughout the experiments. 




546 



H.L. Silvio et al. 



3 Model Evaluation 

Using the scheme discussed in the previous section, a series of ANFIS models was 
developed. The 1992 data were used to develop an ANFIS model for comparison 
with the fuzzy inference system results of Shahin et al. [8]. This earlier research 
combined the available training and testing sets into one model development set. For 
the ANFIS model development, the separate training and testing sets were used. An 
ANFIS model was developed using two inputs (DAP and FIDPK) with six 
membership functions each in order to compare with the results of Shahin et al. [8]. 
Subsequently, an ANFIS model was developed with three membership functions for 
each of the two inputs. Six membership functions for the two inputs required 36 
rules, whereas two inputs with thee membership functions required only nine rules. 
The six membership functions per input were reconfigured as three per input and 
represented by the linguistic variables Low, Medium, and High. As described earlier, 
the ANN determined the final shape and location of the membership functions. 

An ANFIS model was developed using all available data from the three seasons, to 
compare to the earlier results of Boudolf et al. [7]. Another ANFIS model was 
developed using the data uniformly partitioned by Boudolf et al. [7] to distribute the 
patterns equally across the six classes in the training and testing sets. They found that 
equal distribution was desirable with an ANN because otherwise the network would 
give undue weight to any category that contained significantly more patterns. This 
equal distribution was created by eliminating patterns, to allow for 36 patterns in each 
class in the training and testing sets. The evaluation set was left in its original state. 
Where noted, these sets were used instead of the sets discussed above. 

Models were developed using a single year of data for 1993 and 1994 in order to 
determine the ability of the model to account for the variation in a single season. In 
order to perform a more thorough validation, a system was trained with the 1992 
and 1993 data only, leaving the 1994 data for the validation set. This was a test of the 
ability of the model to generalize to a different season of data. 



4 Results and Discussion 

With six membership functions per input, the ANFIS model achieved a percent 
correct value of 45% compared to the FIS percent correct of 45% [8]. The chi-square 
value was 80 and the RMSE was 0.77. With three membership functions for each 
input, percent correct increased to 49%, chi-square decreased to 72, and the RMSE 
increased to 0.85. Since the percent correct, used by previous researchers, and the 
chi-square, used for harvesting decisions, both improved, changing to three 
membership functions was considered an improvement. No other scenarios of 
membership functions per input were considered. 

The chi-square value, RMSE, and percent correct, using all five inputs from all 
three available years of non-uniformly partitioned data, were 107, 0.86, and 43%, 
respectively. The system had a tendency to overpredict maturity classes 4 and 5, 
while underpredicting maturity class 6. With the uniformly partitioned dataset from 
Boudolf et al. [7], the evaluation dataset produced a chi-square of 163, RMSE of 0.66, 
and percent correct of 32%. The ANFIS model strongly overpredicted maturity 




Improving Peanut Maturity Prediction Using a Hybrid Artificial Neural Network 547 



class 1 . The original model development dataset produced superior results than the 
uniformly partitioned model development dataset. The chi-square value and percent 
correct results of 107 and 43% for the ANFIS model developed with the original 
model development dataset compare negatively with the chi-square value and percent 
correct of 55 and 46% for the ANN used by Boudolf et al. [7]. 

ANFIS models were developed for the 1993 and 1994 data separately, using all 
five inputs. The chi-square value, RMSE, and percent correct for the five input 
system trained for the 1993 data were 169, 0.77, and 37%, respectively. The chi- 
square value, RMSE, and percent correct for the five input system trained for 
the 1994 data were 20, 0.84, and 32%, respectively. Given the corresponding values 
for 1992 of 16, 0.49, and 47%, this indicates considerable variation in the noise in the 
data over the individual seasons. Clearly, the 1993 season had the most variability in 
the data. While the percent correct was higher in 1993 than in 1994, the much larger 
chi-square value for 1993 indicates that for the system, when it did predict incorrectly, 
the predictions were extremely inaccurate. The model developed for 1993 strongly 
overpredicted peanut maturity classes 2 and 6 and underpredicted peanut maturity 
class 5. The model developed for 1994 strongly overpredicted only peanut maturity 
class 3. 

An RMSE of 0.86 was the result of the best ANFIS, using 1992-1994 data, 
randomly assigned to the training, testing, and validation sets. The RMSE of 0.90 for 
the system developed with the 1992-1993 data and evaluated with 1994 data indicated 
a slight decrease in accuracy. The chi-square statistic of 136 increased from the 
earlier result of 107, and the percent correct of 28% showed a decrease from the 
earlier result of 43%. These results indicate that the ANFIS model is relatively robust 
in predicting peanut maturity for a season not used in model development. The 
system strongly overpredicted peanut maturity class 4. Likely, this is a reflection of 
seasonal differences. 



5 Summary and Conclusions 

A neuro-fuzzy system was developed, using data from two inputs from 1992, 
which improved the prediction of maturity of peanuts when compared with a FIS 
using the same two inputs. Using two of the five available inputs with six 
membership functions per input from one of the available three years resulted in a 
percent correct of 45%, exactly the same as the result achieved with a FIS [8]. Using 
two of the five available inputs with three membership functions per input from one 
of the available three years resulted in a higher percent correct of 49%. The ANFIS 
model resulted in improved peanut maturity compared to a FIS. 

Using all of the field data from all available years resulted in a percent correct 
of 43% and chi-square value of 107. Using the data specifically partitioned by 
Boudolf et al. [7] resulted in a percent correct of 32% and chi-square value of 163. 
These results might suggest that the size of the training and testing sets were a factor. 
Using the non-uniformly partitioned, and thus larger, datasets resulted in accuracy 
levels closer to those achieved using an ANN [7]. 




548 H.L. Silvio et al. 



Developing separate systems for the 1993 and 1994 data demonstrated the wide 
variation in noise over the different growing seasons, as indicated by the highly 
disparate chi-squares of 169 and 20, respectively. The ANFIS model performed well 
when evaluated with data from a different growing season than the development data, 
demonstrating that the model was fairly robust. 



References 

1. Dorner, J.W., R.J. Cole, T.H. Sanders and P.D. Blankenship. (1989) 
Interrelationship of kernel water activity, soil temperature, maturity, and 
phytoalexin production in preharvest aflatoxin contamination of drought-stressed 
peanuts. Mycopathologia. 105(2): 117-128. 

2. Sanders, T.H.; J.R. Vercellott, and G.V. Giville. (1987) Flavor-maturity 
relationship of florunner peanuts. APRES. 19: 42. 

3. Williams, E.J. and J.S. Drexler. (1981) A non-destructive method for determining 
peanut pod maturity. Peanut Science. 8: 134-141. 

4. Tollner, E.W., V.A. Boudolf, III, R.W. McClendon, and Y-C, Hung. (1998) 
Predicting Peanut Maturity with Magnetic Resonance. Transactions of the ASAE. 
41(4): 1199-1205. 

5. Bovey, F.A. and P.A. Mirau. (1996) NMR of Polymers. San Diego, CA: Academic 
Press. 

6. Friebolin, H. (1993) Basic One and Two Dimensional NMR Spectroscopy (2nd, 
Enlarged edition). Germany: VCH Verlagsgesellschaft mbH. 

7. Boudolf III, V.A., E.W. Tollner, and R.W. McClendon. (1999) Predicting Peanut 
Maturity with NMR - An Artificial Neural Network Approach. Transactions of the 
ASAE (Accepted). 

8. Shahin, M.A., B.P. Verma, and E.W. Tollner. (1999) Fuzzy Logic Model for 
Predicting Peanut Maturity. Transactions of the ASAE (Accepted). 

9. MatLab: The Language of Technical Computing, Version 5.3. (1984-1999) The 
Mathworks, Inc. 

10. Jang, J-S. (1993) ANFIS: Adaptive-Network-Based Fuzzy Inference System. 
IEEE Transactions on Systems, Man, and Cybernetics. 23(3): 665-685. 

11. Huang, Y., R.E. Lacey, and A.D. Whittaker. “Integration of Advanced Statistical 
Methods and Artificial Neural Networks for Food Quality Analysis and 
Prediction.” Presented at the July 1999 ASAE/CSAE-SGCR Annual International 
Meeting, Paper No. 99134. ASAE, 2950 Niles Road, St. Joseph, MI 49085-9659 
USA. 




CIM — The Hybrid Symbolic/Connectionist 
Rule-Based Inference System 



Pattarachai Lalitrojwong 

Faculty of Information Technology 
King Mongkut’s Institute of Teehnology Ladkrabang 
Chalongkmng Rd., Ladkrabang, Bangkok 10520, Thailand 
pattarachai@it . kmitl . ac . th 



Abstract. Previous researeh has shown that eonneetionist models are suitable 
for eognitive and natural language proeessing tasks. An inferenee meehanism is 
a key element in eommonsense reasoning in a natural language understanding 
system. This researeh projeet offers a eonneetionist alternative to Buehheit’s 
symbolie inferenee module for INFANT ealled the Conneetionist Inferenee 
Meehanism (CIM). CIM is a hybrid eognitive model that eombines the 
advantages of the symbolie approaeh, loeal representation, and parallel 
distributed proeessing. Moreover, it makes good use of its modular strueture. 
Several modules work together in CIM, ineluding memory, neural networks, 
and a binding set, to perform the inferenee generation. Besides rule applieation 
eapability, CIM is also able to perform variable binding. A number of 
experiments have shown that CIM ean make inferenees appropriately. 



1 Introduction 

Previous research has shown that eonneetionist models are suitable for cognitive and 
natural language processing tasks [4, 5, 6, 7, 8]. While Buchheit [1, 2] has been using 
a symbolic approach to make improvements in the reasoning process in his original 
conversational system, the INFANT System, this paper demonstrates an alternative 
approach to the inferencing module, a eonneetionist approach called the Conneetionist 
Inference Mechanism (CIM). CIM is intended to demonstrate the feasibility of 
implementing a workable inference mechanism using a conneetionist approach. Its 
main task is to perform inference generation. This high-level inferencing model is 
capable not only of rule application but also variable binding. 



2 The System Architecture of CIM 

CIM, a conneetionist alternative to the inference module of the INFANT 
System [1,2], is a modular system consisting of three main components, a binding set, 
memory, and neural networks. These separate components of various sub tasks are 
integrated into a single high-level system to perform the inference generation. 
Figure 1 illustrates the structure and system flow of CIM. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 549-555, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




550 



Pattarachai Lalitrojwong 




Before describing the three main components of CIM in detail, we would like to 
present a set of rules and facts that we will use as an instance of the system’s 
knowledge in Table 1 to facilitate further explanation. 



2.1 Binding Set 

The binding set is the set of all possible bindings of the variables used in a particular 
piece of knowledge world. For instance, according to Table 1, all variables that need 
bindings are PERSONl, PERSON2 and PERSONS, and all the persons in the story 
are BO, MO and L Therefore, we need a binding set of 6 possible bindings as shown 
in Table 2. As we can see, in any particular binding, PERSONi and PERSON] cannot 
be bound to the same person since they are considered as different persons. 



2.2 Memory 

There are two parts of the memory component in CIM, working memory and a 
working set. Both of them are responsible for storing and retrieving fact patterns. At 
the beginning, working memory stores only given propositions (facts). While the 
inference process is being performed, the inferred propositions will be added into 
working memory. After the process completes, working memory will contain all the 
given and inferred propositions. A working set can be viewed as temporary memory 









CIM - The Hybrid Symbolic/Connectionist Rule-Based Inferenee System 551 



that is used during the inference process. It stores the propositions generated 
(inferred) from RIN for each step of inferencing. After each inferencing step, all the 
propositions in the working set will be moved into working memory. 



Table 1. An example of faets and rules 



Statements 

Rules: 

1. 1 like a person who helps me. 

2. If a person who knows 
computers is with another 
person, s/he can help that 
person. 

3. If a person is logical, he 
knows computers. 

4. If a person is with someone 
who is with another person, he 
is with that person. 

5. If a person is with another 
person, that person is with him 
too. 



NVN representation 



IF [PERSON 1 HELP I] 

THEN [I LIKE PERSON 1] 

IF [PERSONl WITH PERSON2] 

AND [PERSONl KNOW COMPUTER] 
THEN [PERSONl HELP PERSON2] 

IF [PERSONl LOGICAL _] 

THEN [PERSONl KNOW COMPUTER] 

IF [PERSONl WITH PERSON2] 

AND [PERSON2 WITH PERSONS] 
THEN [PERSONl WITH PERSONS] 

IF [PERSONl WITH PERSON2] 

THEN [PERSON2 WITH PERSONl] 



Facts: 

6. Bo is logical. 

7. Bo is with Mo. 

8. Mo is with me. 



[BO LOGICAL _] 
[BO WITH MO] 
[MO WITH I] 



Table 2. Binding set 



PERSONl 


PERSON2 


PERSON3 


BO 


MO 


I 


BO 


I 


MO 


MO 


BO 


I 


MO 


I 


BO 


I 


BO 


MO 


I 


MO 


BO 



2.3 Neural Networks 

Several neural networks cooperate in CIM in order to generate inferences when the 
system acquires facts and rules from users. The neural network component consists of 
binding networks, and a rule inference network. These neural networks are feed- 
forward multilayer networks. The details of these networks will be described below. 

Since CIM makes good use of the modular architecture, each network can be 
trained in its individual task separately and in parallel, as long as they are all trained 
with compatible input/output data. During training, the weight in each neural network 




552 P attarachai Lalitroj wong 



module changes to improve its performance in its task. After these networks learn 
their tasks, they are connected for performance. 



Binding Networks. Binding networks are responsible for the variable binding task. 
Given a binding instance from the binding set, each binding network generates a 
particular proposition according to all the propositions from the antecedents and 
consequents of given rules. The input of each binding network is the current binding 
instance from the binding set. The output of each binding network is a generated 
proposition. For example, according to the rules from Table 1, there are nine different 
patterns of propositions (NVNs) taken from all the antecedents and consequents as 
shown in Table 3. 



Table 3. Rule table 



NVN/Unit Meaning 


Active when 


1 


[PERSONl HELP I] 


- 


2 


[I LIKE PERSONl] 


1 


3 


[PERSONl WITH PERSON2] 


- 


4 


[PERSONl KNOW COMPUTER] 


6 


5 


[PERSONl HELP PERSON2] 


3,4 


6 


[PERSONl LOGICAL ] 


- 


7 


[PERSON2 WITH PERSON3] 


- 


8 


[PERSONl WITH PERSON3] 


3,7 


9 


[PERSON2 WITH PERSONl] 


3 



All nine binding networks are responsible for generating their own NVN patterns. 
If now the binding is the first instance from Table 2, which is {BO/PERSONl, 
MO/PERSON2, I/PERSON3}, the input of each binding network will be 
[BO I MO 1 1], and these nine binding networks will generate nine bound propositions 
as shown in Table 4. 



Table 4. Propositions generated by binding networks given the binding instanee {BO/ 
PERSONl, MO/ PERSON2, 1/PERSON3} 



Binding network 


Generated proposition 


Bindl 


[BO HELP I] 


Bind2 


[I LIKE BO] 


Bind3 


[BO WITH MO] 


Bind4 


[BO KNOW COMPUTER] 


Bind5 


[BO HELP MO] 


Bind6 


[BO LOGICAL ] 


Bind7 


[MO WITH I] 


Binds 


[BO WITH I] 


Bind9 


[MO WITH I] 



The ''ChkMem’' process then checks whether these generated propositions exist in 
working memory, and gives the results to the input layer of the rule inference network 
to make inferences. 




CIM - The Hybrid Symbolic/Connectionist Rule-Based Inferenee System 553 



Rule Inference Network (RIN). RIN performs as a store of rules. In the training 
step, the network is trained to learn all the given rules. In the performance step, the 
system can make an inference as follows: if output unit i of RIN is active, the system 
will add to the working set a proposition NVNi, which generated by binding network 
Bindi, as an inferred proposition. 

According to the rules from Table 1, we can create a rule table as shown in Table 3 
to represent RIN. In the same manner, we will have (at least) nine units in the input 
and the output layers of RIN. The i-th unit of the input layer represents the same NVN 
as the i-th unit of the output layer. Besides, it should correspond to the NVN 
generated by binding network Bindi. 

Since the input layer has nine units, there are a total of 512 possibilities for the 
input. The network is trained from these 512 cases. For each case, its desired output is 
set according to the rule table. For example, if the input is 101100010, the desired 
output will be 010010001. Output unit 2 is active because input unit 1 is active. 
Output unit 5 is active because input units 3 and 4 are active. Output unit 9 is active 
because input unit 3 is active. Although input unit 8 is active, it has no effect. 

The system accesses the rules when there is a new fact. RIN carries out an 
inference generation and yields output units based on the current binding and the 
facts. The output units determine which NVNs can be inferred and will be 
subsequently added to the working set. These NVNs come from the output proposi- 
tions generated by the corresponding binding networks. For example, if the fourth 
output unit of RIN is active and the current binding for PERSONl is BO, then NVN4 
or [BO KNOW COMPUTER], the proposition generated from Bind4, will be added 
to the working set. 

It should be noted that RIN has the advantage of local representations — that is the 
network functions in parallel at the knowledge level. That makes it possible for RIN 
to pursue many inferences at the same time. 

3 Example Run 

Let’s examine an example run of CIM with a set of facts and rules from Table 1. First, 
all neural networks have been trained as described previously. The binding set 
contains all possible bindings as shown in Table 2. The working set is initially empty, 
and working memory stores all given facts, which are [BO LOGICAL _], [BO WITH 
MO], and [MO WITH I]. 

The first loop of the inference process starts with [BO | MO | I] as the input of all 
binding networks. Then, Bindi generates NVNl, which is [BO HELP I]; Bind2 
generates NVN2, [I LIKE BO]; and so on. 

The “ChkMem” process checks which one of these generated NVNs is stored in 
working memory. NVNs 3, 6, and 7, which are [BO WITH MO], [BO LOGICAL _], 
and [MO WITH I], can be found in working memory. Therefore, input units 3, 6, and 
7 of RIN are active; while the others are not. RIN then determines which NVNs can 
be inferred; and activates the proper output units, which are units 4, 8, and 9. So the 
“ChkNVN” process adds NVNs 4, 8, and 9, which are [BO KNOW COMPUTER], 
[BO WITH I], and [MO WITH BO], into the working set. 

The next loop of the inference process will be performed with the next binding. 
When all the loops of the inference process complete, that is, all bindings in the 




554 Pattarachai Lalitrojwong 



binding set are considered, the inferred propositions in the working set will be moved 
to working memory. If there is at least one ‘'new '' inferred proposition that has 
never been stored in working memory, the inference process must be run again. The 
inference process is finished when there are no new inferred propositions. 

After the inference process is completed, working memory contains all given and 
inferred propositions, which are [BO LOGICAL _], [BO WITH MO], [MO WITH I], 
[I WITH MO], [BO KNOW COMPUTER], [BO WITH I], [MO WITH BO], [BO 
HELP I], [I WITH BO], [BO HELP MO], and [I LIKE BO]. 



4 Conclusion 

Many experiments have been made with CIM [3]. These experiments show that it is 
possible to implement an inference mechanism using neural networks. We have 
therefore fulfilled our goal of demonstrating that connectionist models are suitable for 
inferencing as well as other cognitive and natural language processing tasks. 

The Connectionist Inference Mechanism (CIM) is a connectionist alternative to the 
inference module of the INFANT System. It combines the advantages of the symbolic 
approach, local representation, and parallel distributed processing. Moreover, we can 
replace the connectionist module with any better neural network models or utilize 
other learning algorithms. We can find models available now, or invent new models 
by ourselves. Besides, there may be some models developed in the future that are 
better suited for CIM. 



References 

1. Buchheit, P.: INFANT: A Connectionist-Like Knowledge Base and Natural 
Language Processing System. Ph.D. Thesis. University of Illinois at Chicago 
(1991) 

2. Buchheit, P.: A Neuro-propositional Model of Language Processing. International 
Journal of Intelligent Systems. Vol. 14 No. 6 (1999) 585-601 

3. Lalitrojwong, P.: Connectionism as an Inference Mechanism for a Natural 
Language Interface System. Ph.D. Thesis. Illinois Institute of Technology (1999) 

4. McClelland, J.L., Kawamoto, A.H.: Mechanisms of Sentence Processing: 
Assigning Roles to Constituents of Sentence. In: McClelland, J.L., Rumelhart, D.E. 
(eds.): Parallel Distributed Processing: Explorations in the Micro structure of 
Cognition, Vol. 2: Psychological and Biological Models. MIT Press, Cambridge, 
MA(1986) 272-325 

5. Miikkulainen, R.: Natural Language Processing with Subsymbolic Neural 
Networks. In: Browne, A. (ed.): Neural Network Perspectives on Cognition and 
Adaptive Robotics. Institute of Physics Press, Philadelphia, PA (1997) URL: 
ftp://ftp.cs.utexas.edU/pub/neural-nets/papers/miikkulainen.perspectives.ps.Z 

6. Tan, A.H.: Integrating Rules and Neural Computation. In: Proceedings of the 1995 
IEEE International Conference on Neural Networks, Vol. 4. Perth, Western 
Australia (1995) 1794-1799. 




CIM - The Hybrid Symbolic/Connectionist Rule-Based Inferenee System 555 



7. Touretzky, D.S., and Hinton, G.E.: A Distributed Connectionist Production 
System. Cognitive Science. Vol. 12 No. 3 (1988) 423-466 

8. Westermann, G., and Goebel, R.: Connectionist Rules of Languages. In: 
Proceedings of the Seventeenth Annual Conference of the Cognitive Science 
Society. Pittsburgh, PA (1995) 236-241 




A Neural Network Document Classifier with Linguistic 

Feature Selection 



Hahn-Ming Lee, Chih-Ming Chen and Cheng- Wei Hwang 

Department of Eleetronie Engineering 
National Taiwan University of Seienee and Teehnology, Taipei, Taiwan 
hmlee@et . ntust . edu . tw 



Abstract. In this artiele, a neural network doeument elassifier with linguistie 
feature seleetion and multi-eategory output is presented. It eonsists of a feature 
seleetion unit and a hierarehieal neural network elassifieation unit. In feature 
seleetion unit, we extraet terms from some original doeuments by text 
proeessing, and then we analyze the eonformity and uniformity of eaeh term by 
entropy fimetion whieh is eharaeterized to measure the signifieanee of term. 
Terms with high signifieanee will be seleeted as input features for neural 
network doeument elassifiers. In order to reduee the input dimension, we 
perform a meehanism to merge synonyms. Aeeording to the uniformity 
analysis, we obtain a term similarity matrix by fuzzy relation operation. By this 
method, we ean eonstruet a synonym thesaurus to reduee input dimension. In 
the hierarehieal neural network elassifieation unit, we adopt the well-known 
baek-propagation learning model to build some proper hierarehieal 
elassifieation units. In our experiments, a produet deseription database from an 
eleetronie eommereial eompany is employed. The experimental results show 
that this elassifier aehieves suffieient aeeuraey to help human elassifieation. It 
ean save mueh manpower and working time for elassifying a large database. 



1 Introduction 

In this article, we use a hierarchical multilayer Artificial Neural Networks (ANN) 
with Information Retrieval techniques to achieve the document classification. In order 
to reduce feature dimension for our document classification system, effective 
dimensionality-reduction mechanisms are applied. They are conformity and 
uniformity [1,2]. The conformity indicates that a significant term should occur in 
most documents that belong to some categories but not spread in most categories. The 
uniformity means that if a term is significant for a category, it should be widespread 
in most documents that belong to this category as possible as it can, rather than only 
concentrates on few documents. Furthermore, for further reducing the input 
dimension, we perform a mechanism to merge synonyms. According to the 
uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation. 
By this method, we can construct a synonym thesaurus to reduce input dimension. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 555-560, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




556 Hahn-Ming Lee et al. 



From the experimental results, we show that this classification system achieves 
sufficient accuracy to help manual classification. The proposed system is adequate to 
aid the manual classification for product description documents. 

2 System Architecture 

In order to solve document classification problems, a hierarchical neural network 
classification system with linguistic feature selection is proposed. The basic unit of 
the classification system is a 3-layer Back-Propagation (BP) model [3]. The system 
architecture is figured out in Figure 1. It consists of a feature selection unit and a 
hierarchical neural network classification unit. 

In the feature selection unit, we obtain textual terms from original linguistic 
description documents via text processing which includes word filtering and 
stemming [4], [5], [6]. Each word that passes through text processing is named term. 
Since the significance of term can be determined by its conformity and 
uniformity [1,2], we then measure each term’s conformity and its uniformity to each 
category. Those significant terms are treated as the features for the following 
classifiers. For reducing the number of features, we use fuzzy relation [7] to construct 
a brief synonym thesaurus and then merge such synonyms according to the results 
from uniformity measure. 

The hierarchical neural network classification unit is constructed in a 3 -level 
hierarchy, and the basic unit of this hierarchical system is a 3 -layer BP model [3]. 
Each BP is specialized to learn the knowledge of a category set. The classification 
unit is multi-category output, and the output results will be ranked. The results that 
are yielded by former level BPs will be propagated to the next level BPs for further 
classification. We also rank and evaluate outputs to determine those outputs that are 
mostly desired. 




Fig.l. System architecture 





A Neural Network Doeument Classifier with Linguistie Feature Seleetion 557 



2.1 Feature Selection Unit 

In feature selection phase, text processing extracts the terms from original documents 
first. Next, we compute the conformity and uniformity of each term, and then select 
the significant terms through a simple but effective selection method. To reduce the 
input dimension, a synonym merging process is applied. Some main operations of 
feature selection unit will be detailed in the following subsections. 

2.1.1 Conformity and Uniformity 

(1) Conformity 

The conformity indicates how representative a term is for some categories. It is 
difficult to distinct the discrepancy among categories once we use a general term as a 
feature. In other words, a significant term should occur in documents that belong to 
some categories but not spread in most categories. Conformity is sometimes measured 
in ICF (Inverted conformity frequency) that is defined as follows: 

ICFj = Py log Py ; Py = ( 1 ) 

i=\ Zj 

where i indicates a category ; j indicates a term ; dy is the document frequency of term 
j in category /; py is the probability that term j occurs in category i . 

(2) Uniformity 

The purpose of uniformity is: if a term is meaningful for a category, it should be 
included in most documents that belong to this category as possible as it can, rather 
than only concentrates on few documents. Thus, the uniformity can be measured as: 
For a category /, 

^ tfki 

Uji = Qkj log qiy ; Qkj = 

i=\ / JJkj 

where k indicates a document k that belongs to category /; j indicates a term; tfjy is the 
term frequency of term j in document k\ qjy is the probability that term j occurs in 
document k. 

2.1.2 Synonym Thesaurus 

Consider two terms whose significance to each category is alike, they probably 
represent an identical meaning and we can regard them as synonym [1], [8]. The 
above-mentioned „the term’s significance to each category^ actually equals to „the 
term’s uniformity to each category^ that is introduced in Section 2.1.1. Thus, we can 
construct synonym thesaurus based on the uniformity of terms, and then group similar 
terms to build a synonym thesaurus. In our model, fuzzy relation [7] is adopted to 
measure the tern similarity based on uniformity. 

For measuring the term similarity, a composition for the above two fuzzy 
relations is needed. The composition of fuzzy relation composes two fuzzy relations 
to a new fuzzy relation [7]. We use Max-Min composition which is one of 
composition operators of fuzzy relation [9]. 

Assume that there are n categories and m terms, we can define a composition of 
fizzy relations for a term as follow: 

o U = Ry 



( 3 ) 




558 Hahn-Ming Lee et al. 



where A, is a Ixn matrix whose elements are the uniformity of term^ to eaeh 
eategory, U is a nxm matrix ineludes all uniformity of all terms to all eategories, 
and Rj is a Ixm matrix that indieates the similarity between terrn^ and other terms. 

As the result of this eomposition, we get a matrix Rj for terrn^ , and the elements 
in this matrix represent the similarities between term^ and other terms. While we 
eomplete the fuzzy relation eomposition for all terms, all similarities between eaeh 
two terms ean be measured. 



3 Experimental Results 

Our training and testing data set eomes from All Produets Online Corporation 
(www.allproduets.eom). In their databases, a doeument is eorresponding to a produet 
and deseribes the properties of this produet. The deseription for a produet is organized 
in some fields, sueh as produet name, speeifieation, main features and keywords for 
indexing. Besides, eaeh produet is manually elassified to one or more eategories 
aeeording to the eode book. 

3.1 Dimensionality Reduction by Feature Selection 

In this seetion, we present the experimental results of dimensionality reduetion by our 
proposed feature seleetion method shown as Table 1. In first stage, our proposed 
method seleet signifieant terms aeeording to the measures of eonformity (ICF) and 
uniformity (U); as a result, almost one thousand terms and more than three thousand 
terms are eliminated in set 1 and set 2, respeetively. In seeond stage, we use our 
proposed synonym merging method for the seleeted terms of first stage, some 
synonym groups will be yielded and the number of seleeted features ean be further 
redueed to 1360 in set 1 and 1722 in set 2. 



Table 1. Dimensionality reduetion by feature seleetion 



Doeument set 


1 

(500 doeuments) 


2 

(3000 doeuments) 


Original words 


3013 


7123 


Original terms 
(after text proeessing) 


2469 


5390 


Seleted terms 

(after seleetion by ICF and U) 


1528 


1854 


Selected features 

(after synonym merging) 


1360 


1722 



Threshold of ICF : < 85% ICF„,ax ’ Thresholds of U : > 25% U„,ax ’ 
Threshold of S : > 25% S„,ax 

Number of synonym group : 500 doeuments ^14 groups 



3000 doeuments ^ 27 groups 




A Neural Network Doeument Classifier with Linguistie Feature Seleetion 559 



3.2 Classification Results 

3.2.1 Training Phase 

Table 2 shows the aeeuraey of eaeh layer in training phase. Note that we only 
measure the exaet-mateh in training phase beeause the training results will be 
aeeeptable if and only if exaet-mateh reaehes a satisfied effeet. 

3.2.2 Testing Phase 

Table 3 shows the aeeuraey of eaeh layer in testing phase. We measure preeision, 
reeall and the measurement for five output situations. Obviously, average reeall is 
better than average preeision in eaeh level. This phenomenon is predietable and 
reasonable. Sinee our training doeuments are multi-eategory and manually pre- 
elassified, it is not easy to output the results that exaetly mateh the desired outputs. 
Furthermore, some output eategories that are involved in a doeument but not ineluded 
in desired output set are diseovered by our system. 

The number of aetual output eategories is usually more than that of desired 
output eategories. In other words, the eoverage-mateh situation oeeurs mueh 
frequently. This phenomenon eauses that the average reeall is better than average 
preeision. In our multi-eategory ease, we emphasize on diseovering more potential 
eategories that may not be diseovered yet by manually elassifieation (i.e. to promote 
the reeall) although the preeision is merely adequate. 

The aeeuraey in level 1 is aeeeptable, but the eoverage-mateh situation deereases 
in level 2 and 3. Meanwhile, the overlap-mateh situation beeomes more frequent. 
These problems are eaused by the noises in training doeuments. It’s diffieult to seleet 
signifieant features to represent sueh eategories. 

Table 2. The aeeuraey of eaeh layer in training phase 
Doeument set 1 (500 doeuments) Doeument set 2 (3000 doeuments) 



Level 


Average Accuracy of 
Exact-match 


1 


90.1% 


2 


87.8% 


3 


85.2% 



Level 


Average Accuracy of 
Exact-match 


1 


100% 


2 


100% 


3 


100% 



4 Conclusion 

In this artiele, we propose a 3 -level hierarehieal neural network elassifieation system 
for linguistie doeuments by applying BP neural networks. This elassifieation system 
eontains an effeetive feature seleetion to analyze linguistie terms and seleet signifieant 
terms based on the analysis of eonformity, uniformity and fuzzy synonym thesaurus. 
In the 3 -level hierarehieal neural network elassifieation system, the hierarehy is 
eorresponding to the given eategory strueture. Eaeh BP elassifier represents a parent- 
eategory and elassifies doeuments into ehild-eategories that sueeeed to it. 

The effeet of our elassifier is tested by employing a produet deseription doeument 
database whieh is provided by All Produet Online Company. The experimental results 






560 Hahn-Ming Lee et al. 



show that this classifier achieves sufficient accuracy to help human classification. It 
can save much manpower and working time for classifying a large database. 

Table 3. The accuracy of each layer in testing phase 



Document set 1 (500 documents) 



Level 


Accuracy 


Average 

precision 


Average 

recall 


Exact- 

match 


Coverage 

-match 


Subset- 

match 


Overlap- 

match 


No- 

match 


1 


37% 


91% 


3% 


86% 


0% 


9% 


0% 


2 


52% 


87% 


10% 


69% 


4% 


14% 


0% 


3 


48% 


77% 


14% 


51% 


0% 


33% 


0% 



Document set 2 (3000 documents) 



Level 


Accuracy 


Average 

precision 


Average 

recall 


Exact- 

match 


Coverage 

-match 


Subset- 

match 


Overlap- 

match 


No- 

match 


1 


21% 


76% 


1% 


72% 


0% 


26% 


0% 


2 


24% 


74% 


1% 


63% 


0% 


34% 


0% 


3 


17% 


66% 


0% 


54% 


0% 


45% 


0% 



References 

1. Salton, G.: Automatic Text Processing: The Transformation, Analysis, and 
Retrieval of Information by Computer. Addison Wesley (1989) 

2. Yun-Long Huang: A Theoretic and Empirical Research of Cluster Indexing for 
Mandarin Chinese Full Text Document. The Journal of Library and Information 
Science. 24 (1998) 1023-2125 (in Chinese) 

3. Rumelhart, D. E., Hinton, G. E., and William R. J.: Learning Internal 
Representation by Error Propagation. Parallel Distributed Processing. Vol.l. MIT 
Press (1986) 

4. Luhn, H. P.: A Statistical Approach to Mechanized Encoding and Searching of 
Literary Information. IBM Journal of Research and Development. Vol.l, No.4. 
(1957) 

5. Porter, M. E.: Competitive Strategy: Techniques for Analyzing Industries and 
Competitors. New York: Free Press (1980) 

6. Francis, W., and Kucera, H.: Frequency Analysis of English Usage. New York 
(1982) 

7. Zadeh, L. A.: Towards a Theory of Fuzzy Systems. Aspects of Networks and 
Systems Theory. New York. (1971) 469-490 

8. William B. Frakes, Ricardo Baeza-Yates: Information Retrieval: Data Structures 
& Algorithms. Prentice Hall PTR (1992) 

9. George J. Klir, Bo Yuan: Fuzzy Sets and Fuzzy Logic: Theory and Applications. 
Prentice Hall PTR (1995) 






Color Pattern Recognition on the Random Neural 
Network Model 



Jose Aguilar and Valentina Rossell 

CEMISID. Dpto. de Computacion 
Facultad de Ingenieria. Universidad de los Andes 
Av. Tulio Febres. Merida, 5010, Venezuela 
aguilar@ing . ula . ve 



Abstract. The purpose of this paper is to deseribe the use of the multiple 
elasses random neural network model to learn various patterns having different 
eolors. We propose a learning algorithm for the reeognition of eolor patterns 
based upon the non-linear equations of the multiple elasses random neural 
network model using gradient deseent of a quadratie error funetion. Our model 
is defined for nC parameters for the whole network, where C is the number of 
eolors, n is the number of pixels of the image, and eaeh neuron is used to obtain 
the eolor value of eaeh pixel in the bit map plane. 



1. Introduction 

The Random Neural Network (RNN) has been proposed by Gelenbe in 1989 [6,7,8]. 
This model calculates the probability of activation of the neurons in the network. 
Signals in this model take the form of impulses which mimic what is presently known 
of inter-neural signals in biophysical neural networks. The RNN has been used to 
solve optimization [1,2,3] and pattern recognition problems [4,5]. Fourneau and 
Gelenbe have proposed an extension of the RNN, Multiple Classes Random Neural 
Network (MCRNN) [9]. The problem addressed in this paper concerns the proposition 
of a learning algorithm for the recognition of color patterns, using MCRNN. We shall use 
each class to model a color. We present a backpropagation type learning algorithm for 
the MCRNN, using gradient descent of a quadratic error function when a set of input- 
output pairs is presented to the network. Thus, it requires the solution of a system of 
nC non-linear equations each time the n-neurons network learns a new input-output 
pair (n-pixels image with C colors). This work is organized as follows, in section 2 the 
theoretical bases of MCRN are reviewed. Section 3 presents our learning algorithm 
for MCRNN. In section 4, we present color pattern recognition applications. Remarks 
concerning future work and conclusions are provided in section 5. 



2. The Multiple Classes Random Neural Model 

The MCRNN is composed of n neurons and receives exogenous positive (excitatory) 
and negative (inhibitory) signals as well as endogenous signals exchanged by the 
neurons. Excitatory and inhibitory signals are sent by neurons when they fire, to other 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 561-566, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




562 Jose Aguilar and Valentina Rossell 



neurons in the network or to outside world. In this model, positive signals may belong 
to several classes and the potential at a neuron is represented by the vector 
Ki=(Kii, Kic), where Kic is the value of the "class c potential" of neuron /, or its 
"excitation level in terms of class c signals", and negative signals only belong to a 
single class. The total potential of neuron i is Kic- When a positive signal 

of class c arrives at a neuron, it merely increases Kic by 1? when a negative 
signals arrives at it, if Ki>0, the potential is reduced by 1 , and the class of the potential 
to be reduced is chosen randomly with probability Kic/Ki for any c=l, ..., C. 
Exogenous positive signals of class c arrive at neuron i in a Poisson stream of rate 
A(i, c), while exogenous negative signals arrive at it according to a Poisson process of 
rate X(i). A neuron is excited if its potential is positive (Ki>0). It then fires, at 
exponentially distributed intervals, sends excitatory signals of different classes, or 
inhibitory signals, to other neurons or to the outside of the network. The neuron i 
sends excitatory signals of class c at rate r(i, c)>0, with probability Kic/Ki. When the 
neuron fires at rate r(i, c), deletes by 1 its class c potential and sends to neuron j a 

class (p positive signal with probability p^(i, c; j, (p), or a negative signal with 

probability p"(i, c; j). On the other hand, the probability that the deleted signal is sent 
out of the network is d(i, c). Let q(i, c) with 0 < q(i, c) < 1 be the solution of the 
system of non-linear equations: 

q(i,c) = V(i, c)/(r(i, c)+r(i)) (1) 

where, V(i, c) = E(j^ (p) q(j, (p)r(j, 9)p“^0> 9 ; h c)+A(i, c) 

^■(i) = 2:(j, (p) qO, (p)r(j, 9)P'(i, 9; O+VO 



The synaptic weights for positive (w^(j, (p; i, c)) and negative (w"(j, (p; i)) signals are 
defined as: 



w+( j, (p; i, c) = r(j, 9)p'^( 9; h c) W( j, cp; i) = r(j, 9)P'( 9; 0 

and, r(j, 9) = [2^(i, c) w+( j, 9; i, c) + i;(i, c) W( j, 9; i)] 



3. Learning Algorithm on the Multiple Classes Random Neural 
Network Model 

We propose a gradient descent algorithm for choosing the set of network parameters 

w^(j, z; i, c) and w"(j, z; i) in order to learn a given set of m input-output pairs (X, Y) 
where the set of successive inputs is denoted by: 

X = {Xi, ..., Xj^} Xk= {Xk (U), Xk (n, C)} 

X]^ (i, c) is the class on the neuron i for the patron k 



where. 




Color Pattern Recognition on the Random Neural Network Model 563 



Xk(i, c) = {Ak(i, c), Ak(i)} 
and the successive desired outputs are the vector 

where, Yk= {Yk (1,1), Yk (n, C)}, and Yk (1,1)={0, 0.5, 1} 

The values Ar(1, c) and ?iR(i) provide the network stability. Particularly, in our model 
A(i, c)=Lic and 'k(i)=0, where Lie is a constant for the class c of the neuron i. Xj^(i, c) 
are initialized as follows: 

Yk(i, c)>0 => Xk(i, c) = (Ak(i, c), ^k(i)) = (Lie, 0) 

Yik(i, c)=0 => Xk(i, c) = (Ak(i, c), ^k(i)) = ( 0 , 0) 

The rule to update the weights may be written as: 

W|j+ (u,p; v,z) = W|j.i+ (u,p; v,c)- ^^c=l (0k(i>c)- yk(i>c))[ §q(i,c) / 5w+ (u,p; v,z)]|^ 

( 2 ) 

Wk‘(u,p; v) = (u,p; v)- (qk(i,c)- yk(i.c))[ 5q(i,c) / 5W (u,p; v)]]^. 



where, p > 0 is the learning rate (some constant). 

qj^(i) is calculated using Xk, w^j^(u, p; v, z) = P? 

w"k(u, p; v) = w"]^_j(u, p; v) in (1) 

[5q(i,c) / 5w^(u,p;v,z)]k and [5q(i,c) / 5w"(u,p;v)]k are evaluated 



using the 



values q(i,c) = qk(i,c), w+j^(u, p; v, z) - w+j^_j(u, p; v, z) 



w"]^(u, p; v) = w"]^_j(u, p; v) in (2) 

and, 8q(i, c)/ 8W^(u,p;v,z) = y^(u,p;v,z)/q(u,p) [I-W]"^ 
8q(i, c)/ 8W"(u,p;v) = TT(u,p;v)/q(u,p) [I-W]"l 



if (u=i) and (vT^i) then 

Y^(u,p;v,z) = -1/D(i,c) 
^(u,p;v)= -1/D(i,c) 

if (u=i) and (v=i) then 
Y^(u,p;v,z) = 0 

^(u,p;v)= -(1+ q(i,c))/D(i,c) 



if (u^^^i) and (v=i) then 

Y^(u,p;v,z) = 1/D(i,c) 
^(u,p;v) = -q(i,c)/D(i,c) 

if (uT^i) and (v?^i) then 
Y^(u,p;v,z) = 0 
^(u,p;v)= 0 




564 Jose Aguilar and Valentina Rossell 



finally, D(i,c) = r(i, c)+ E" q(j, z) w-(j, z; i)] 

W = z; i> c) + w-(j, z; i)q(j,z)]/D(j,z) 

The complete learning algorithm for the network is: 

- Initiate the matrices Wq^ and Wq~ in some appropriate manner. Choose a value of 

p in (2). 

- For each successive value of m: 

- Set the input-output pair (Xp, Yj^ 

- Repeat 

- Solve the equation (1) with these values 

- Using (2) and the previous results update the matrices W and Wf 

Until the change in the new values of the weights is smaller than some 
predetermined valued. 



4. Color Pattern Recognition Problem on the Multiple Classes 
Random Neural Network Model 

4.1 Problem Definition 

We now show how the MCRNN can be used to solve the Color Pattern Recognition 
problem. In this section, we present several examples to compare the quality of our 
learning algorithm for different pattern types. In our approach, a "signal class" 
represents obviously each color. To design such a memory, we have used a single- 
layer MCRNN of n fully interconnected neurons. For every neuron i the probability 
that emitting signals depart from the network is d(i, c)=0. We suppose a pattern 
composes by n points (j, k) in the plane (for j= 1, ..., J and k=l, ..., K). We associate a 
neuron N(i) to each point (j, k) in the plane (for i=l, ..., n; 1, ..., J and k=l, ..., K). 
The state of N(i) can be interpreted as the color intensity value of the pixel (j, k). That 
is, each pixel is represented by a neuron. In another hand, we suppose three classes to 
represent the primary colors (red, green, and blue) according to the RGB model. This 
model allows create different colors with the combination of different intensities of the 
primary colors. For example, to represent a pixel with red color the neuron value 
is (1, 0, 0), the black color is (1, 1, 1), the pink color is (0.5, 0, 0), etc. We suppose 
values equal to 0, 0.5 and 1 for each class on every neuron. In this way, we can 
represent geometric figures with different combinations of colors. The parameters of 

the neural network will be chosen as follows: p^(j, cp ; i, c) = p^( i, c; j, cp) and 

p"( i, c; j) = p~( j, c; i), Vi,j=l, ..., n c,(p=l,..., C. We will input various geometric 
figures to MCRNN and train the network to recognize these as separate categories. To 
evaluate our learning algorithm, we use a set of figures (group A) composes by the 
figures shown in figure 1, where blackened boxes represent blue colors, gray boxes 
represent green colors and white boxes represent red colors. Each figure is represented 
by a 6*6 grid of pixels. Thus, we use a single-layer MCRNN composed by 36 neurons 
(n=36) and 3 classes (C=3). 




Color Pattern Recognition on the Random Neural Network Model 



565 




Fig. 1. Geometric Figures with three colors. 



4.2 Results Analysis 

The results for the first group are presented on figure 2. To evaluate the performance 
of the learning algorithms, we show the minimal errors reached during the learning 
phase and their execution times. These values represent the average of 8 processes for 
each set Si of images. This algorithm provides a good error convergence for the 
learning phase. Particularly, the learning of the sets S 4 and S 5 remain good for our 
learning algorithm. Concerning S 2 and Sg, error costs increases. 



Learning 

Error 




Execution 
Time (min) 




Fig. 2. Learning error and execution time of the learning algorithms 

In order to test associative memories, we have evaluated the recognition rates of 
distorted versions of the training patterns using the recognition algorithm proposed 
in [4]. These values represent the average of 8 processes for each set S{ of images. We 
generated 20 noisy images used as inputs, for each training image and for a given 
distortion rate. The result of the learning stage is used as the initial neural network of 
this second phase (retrieval stage). We have corrupted them by reasonable noise rates 
equal to 0%, 15% and 30% distortion by modifying bit values at random. A pattern is 
recognized if the residual error rate is less than 3. The results are presented on 
figure 3. The performance results obtained are lower when the noise rate is important 
(memories are then more discriminating). Our algorithm provides a good recognition 
rate. Particularly, the recognition rate of the sets S 4 and S 5 remain good for our 
approach. Concerning SiQ and 30% of noise rate, recognition rate decreases. 




566 Jose Aguilar and Valentina Rossell 



Noisy Rate 


0% 


15% 


30% 


Number of 
Figures 


4 


6 


10 


4 


6 


10 


4 


6 


10 


Reeognition 

Rates 


99% 


99% 


97% 


88% 


87% 


84% 


73% 


71% 


67% 



Fig. 3. Reeognition rate of noisy versions of figures. 



5. Conclusions 

In this paper, we have propose a learning algorithm based on the Multiple Classes 
Random Neural Model. We have shown that this model can efficiently work as 
associative memory. We can learn arbitrary colour images with this algorithm, but the 
processing time will increase rapidly according to the number of pixels and colours 
used. The number of neurons is dictated by the image resolution (in our case, we are 
test for 6*6 pixels). During the learning phase, we have met classical problems like 
the existence of local minima and large learning times. However, most of the 
computations are intrinsically parallel and can be implemented on SIMD or MIMD 
architectures. Next work will study a new retrieval algorithm adapted to these types of 
figures. 



References 

1. Aguilar, J.: Evolutionary Learning on Recurrent Random Neural Network. Proc. of 
the World Congress on Neural Networks, International Neural Network Society 
(1995) 232-236. 

2. Aguilar, J.: An Energy Function for the Random Neural Networks. Neural 
Processing Letters 4 (1996) 17-27. 

3. Aguilar, J.: Definition of an Energy Function for the Random Neural to solve 
Optimization Problems. Neural Networks 11 (1998) 731-738. 

4. Aguilar, J., Colmenares A.: Resolution of Pattern Recognition Problems using a 
Hybrid Genetic/Random Neural Network Learning Algorithm. Pattern Analysis 
and Applications 1 (1998) 52-61. 

5. Atalay, V., Gelenbe, E., Yalabik, N.: The random neural network model for texture 

generation. Inti. Journal of Pattern Recognition and Artificial Intelligence 6 (1992) 
131-141. 

6. Gelenbe, E.: Random neural networks with positive and negative signals and 
product form solution. Neural Computation 1 (1989) 502-511. 

7. Gelenbe, E.: Stability of the random neural networks. Neural Computation 2 (1990) 

239-247. 

8. Gelenbe, E.: Learning in the recurrent random neural network. Neural Computation 

5 (1993) 325-333. 

9. Foumeau, M., Gelenbe, E., Suros, R.: G-networks with Multiple classes of negative 

and positive customers. Theoretical Computer Science, 155 (1996) 141-156. 





Integrating Neural Network and Symbolic Inference for 
Predictions in Food Extrusion Process 

Ming Zhou' and James Paik^ 

' Indiana State University, Department of Industrial & Mechanieal Technology 
Terre Haute, IN 47809 

^ W.K. Kellogg Institute, Battle Creek, MI 49016 



Abstract. Predieting proeess outputs in a food extrusion proeess is a diffieult 
task due to multiple variables and their highly nonlinear relationship. 
Experimental data have been eolleeted by earlier researehers to fit statistieal 
models to identify proeess eonditions that result in the “best” output. A neural 
network is developed and trained with experimental data to eapture the proeess 
knowledge, and map the relationship between proeess variables and proeess 
output. An expert system is developed that uses the neural network as an 
inferenee engine eomponent to make exaet predietions. It also has a knowledge 
base that eontains a set of symbolie rules. This allows the system to provide a 
more eomprehensible form of predietion that helps engineers gain a better 
understanding of the problem dynamies. 



1. Introduction 

Food extrusion process is a continuous production process that involves a number of 
process variables (e.g. moisture content, mass feed rate, barrel temperature, etc.) The 
presence of multiple process variables and their complex interactions makes the 
analytical modeling of the process extremely difficult [6]. Traditionally, experimental 
methods have been used to identify influential factors and optimal process conditions. 
The process output is usually a set of quality characteristics (e.g., texture properties). 
For instance. Response Surface Methodology (RSM) has been used in extrusion 
process analysis to identify the set of parameter values that optimizes process 
outputs [7,8]. These experimental studies have accumulated a rich source of empirical 
data that contains useful information and knowledge about food extrusion process. 
Engineers can use such information and knowledge to improve the process and 
develop better products. Unfortunately, much of this resource is not utilized due to the 
lack of tools that can capture and extract process knowledge embedded from raw data. 

Neural networks have been used successfully in pattern recognition and data 
mining. It can capture hidden patterns among raw data and represent complex 
relationship between sets of variables. This property makes neural network a popular 
and effective tool in process control and analysis [4]. With properly designed 
architecture and trained with adequate data, a neural network is capable of learning 
process dynamics and making predictions on process output based on the learned 
knowledge. In spite of its advantages, neural network has been criticized [1] for its 
“black box” nature, i.e. it is hard for humans to understand “why does a neural net 
work?” It is important that the knowledge learned by a neural network can be used to 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 567-572, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




568 Ming Zhou and James Paik 



help engineers gain better understanding of a complicated process. Traditional expert 
systems provide a more comprehensible form of inference called symbolic inference 
that can be easily understood and interpreted by human [2], and thus have gained 
much popularity in various applications such as engineering design, process control 
and analysis. The process knowledge embedded in a trained neural network can be 
used to develop knowledge base and inference engine of an expert system [4]. No 
such effort has been found in food extrusion area. An intelligent system that integrates 
neural network and traditional symbolic inference can predict process output 
accurately, and allow food engineers to develop better understanding and experience 
through decision-making process. This would greatly benefit product development, 
process design and improvement in food process engineering. This study investigates 
neural networks that can learn effectively food extrusion process and develops an 
expert system that integrates neural network for process prediction. The application 
problem and data selected for the development and implementation of proposed 
system are from an early experimental study by Olkku and Vainionpaa [7]. 



2. Development of a Neural Network Prediction Model 

A feed-forward neural network is developed in this section. It takes a set of input 
values and predicts the value of an output. The input variables are five process 
variables in an extrusion process of starch-protein-sugar paste [7]. The definition and 
the range of input and output variables are given as follows: 

Xi = Protein to starch ratio (pure, dry basis), with a range Ri = [0, 0.20]; 

X2 = Sugar to starch-protein mixture ratio (pure, dry basis), with a range R2 = [0, 1]; 
X3 = Water content of mass feed into extruder (%), with a range R3 = [0.1, 0.3]; 

X4 = Set temperature of extruder barrel (°C), with a range R4 = [80, 120]; 

X5 = Mass feed rate into extruder (kg/min), with a range R5 = [0.2, 0.6]. 

Y (output variable) = breaking point (N), with a range of [0, 500]. 

The proposed network has three layers. The first is an input layer that contains five 
nodes that simply receive input values. The second is a hidden layer, and has four 
neurons. The last layer is an output layer with only one neuron for output Y (see 
Figure 1). The transfer functions are tangent sigmoid function [4] for all neurons in 
the hidden layer, and a linear function for the output neuron. Alternative network 
architectures, such as 5-3-1, 5-5-1 and 5-6-1 were also investigated. All neural 
network models are implemented with MATLAB© Neural Network ToolBox. A 
total of 30 input-output data points are partitioned into a training set and a validation 
set. The training set has 27 data points and is used to train the neural network to learn 
the extrusion process. The validation set has 3 data points and is used to verify the 
performance of trained neural network. An efficient algorithm, Levenberg Marquardt 
(LM) algorithm [3], is used to improve simple backpropagation learning. The results 
are presented in Figure 2 and Table 1. Figure 2 shows that for structure 5-4-1, the 
error goal (Mean-Squared Error), was achieved after only 108 epochs. The error goals 
achieved by other structures (at 1000 epochs) and validation results are shown in 
Table 1. Note that structure 5-4-1 has the smallest total validation error (TVE). We 




Neural Network and Symbolie Inferenee for Predietions in Food Extrusion Proeess 569 



also compared the predictions made by the neural network with those by a regression 
model in the original study [7]. The average prediction error of the neural network 
is 2.35, while the average error by the regression model is 30.51. 



X, 



X 



7 



X4 





108 Epochs 



Fig. 2. Result of neural network training 



Table 1. Average error and validation results for different network struetures 



Structure 


Eiooo 


TVE 


5-3-1 


0.017 


24.7 


5-4-1 


<0.001 


12.7 


5-5-1 


0.0241 


19.6 


5-6-1 


0.0327 


30.4 



570 Ming Zhou and James Paik 



3. Development of a Neural Network Based Expert System 

Our goal in developing an intelligent system for prediction of food extrusion process 
is twofold: (1) The system should make an “exact” prediction of output when user 
specify a set of input values; (2) The system should provide a symbolic prediction to 
help user understand the process behavior. A schematic diagram (Figure 3) describes 
the system. It consists of a user interface, a working memory, a knowledge base, and 
an inference engine that has two independent units: a logic inference processor (LIP) 
and a neural network (NN). Food engineers interact with the system through the user 
interface. The working memory stores user-specified information and intermediate 
search results. The knowledge base contains symbolic decision rules. The inference 
engine combines facts and knowledge to produce both an exact prediction and a 
symbolic one. To obtain exact result, the system activates a trained neural network to 
compute output for specified input values. The logic inference processor (LIP) works 
based on the principle of “Modus Ponens” [2]. It scans the rules in the knowledge 
base and looks for the one whose premises match with specified input pattern. If such 
a rule is found, it is set to “fire”, i.e. its conclusion is used as the symbolic solution. 




Symbolic 

result 

Exact result 



Fig. 3. A neural network based expert system 



To formulate symbolic rules (IF-THEN rules) and develop a knowledge base, we 
partitioned the input space into a finite number of subspaces that can be characterized 
by symbolic labels such as LOW, MEDIUM, and HIGH. Let U be the input space, 
then U = Ui u U2 u... u Uj,, where each subspace Uf is bounded by a set of 
inequalities of input variables. For any given input vector x, x g Ui implies that: In < 
Xi <uii\ l2i <X2 <U2i\ hi <X^ hi <X4 <U4i\ hi ^^5 ^^5/- The lower bounds Zy/’s 
and upper bounds Wy/’s satisfy the following: Zi [ly, Uy] = Ry, V 7, where Rj is the range 
of input variable Xj. The range of output Y is also partitioned into a set of disjoint 
intervals {Ij, I 2, h}- Each interval is assigned a linguistic label such as “LOW”, 

“HIGH”, etc. For this application, the range of process output is divided into five 
intervals, labeled “VL”, “L”, “M”, “H”, and “VH” respectively, where VL = Very 
Low, L = Low, M = Medium, H = High, and VH = Very High. The range of each 
input variable is partitioned into three intervals that are characterized by a set of 
linguistic labels {LOW, MED, HIGH}. The subintervals corresponding to the 








Neural Network and Symbolie Inferenee for Predietions in Food Extrusion Proeess 571 



linguistic labels for input and output variables are listed in Table 3. There is a total 
of 243 input patterns (for five input variables, 3^ = 243). Each of these input-output 
patterns corresponds to a symbolic rule such as “If Xi=HIGH & X 2 =LOW & 
X 3 =MED & X 4 =L 0 W & X 5 =MED Then Y = LOW 



Table 3. Labels and ranges of input/output subspaees 



Labels 


L (Low) 


M (Medium) 


H (High) 




[ 0 , 0.06) 


[0.06, 0.14] 


(0.14, 0.20] 


^2 


[0, 0.33) 


[0.33, 0.67] 


(0.67, 1.0] 




[0.10,0.16) 


[0.16, 0.24] 


(0.24, 0.30] 




[0.6, 0.73) 


[0.73, 0.87] 


(0.87, 1.0] 




[0.2, 0.33) 


[0.33, 0.47] 


(0.47, 0.6] 


Y 


VL 


L 


M 


H 


VH 


Range 


0 

1 

0 

0 


100-200 


200-300 


300-400 


400-500 



The knowledge used to formulate symbolic rules are obtained by computing input- 
output patterns with the trained neural network and characterizing the patterns with 
linguistic terms defined earlier. However, due to the highly nonlinear relationship 
between the inputs and output of the neural network, the output Y may assume any 
value in its range for a specified input pattern. It is therefore desirable to have some 
measure of certainty for symbolic predictions of Y (e.g., what are the chances that Y 
is LOW, or Y is HIGH?). We propose a probability factor for this purpose. Let Py be 
the conditional probability that Y will be in interval Ij given that X is in subspace Uf 
(for / =1 to 243 and 7=1 to 5). To derive an estimate of Py, we sample a set of n points 
in Ui, say {xu, ... , Ui, compute their corresponding output values using the 
trained neural network. This produces an output vector T/ = {yu, yi 2 , . . ., . Let Uy be 

the number of output values in T/ that fall into interval ly, then a relative frequency 
ratio riyln can be used as an estimate for Py , i.e. Py ~ nyln. For any rule /, the sum 
of Py over index j equal one. A symbolic rule associated with probability factors is in 
the following form: If x g Uf then with a 100P,y% chance, the output Y will fall into 
interval Ij V j. For example, a symbolic rule may look like this: “If Xi is Low & X 2 is 
High & ... & X 5 is Medium Then Y is Low with probability 0.3 or Y is Medium with 
probability 0.6 or Y is High with a probability 0.1” 



4. Numerical Experiments 

The proposed system was tested with numerical experiments to verify its performance 
and check for consistency. Two examples are given below for illustration purpose. 



Example 1: given an input vector X = {0.04, 0.85, 0.11, 96, 0.24}. 

The pattern corresponding to X is classified as Xi = Low, X 2 = High, X 3 = Low, 
X 4 = Medium, and X 5 = Low. The exact prediction is Y = 336.42, i.e. Y g “High”. 





572 Ming Zhou and James Paik 



The following rule was fired “ If Xi=L & X2=H & X3=L & X4=M & Xs=L Then Y=H 
with a probability 0.832 Or Y=M with probability 0.12 Or Y=L with 
probability 0.028 Or Y=VL with probability 0.02 Or Y=VH with probability 0.00”. 

Example 2: given an input vector X = {0.17, 0.5, 0.2, 96, 0.55}. 

The pattern X is classified as Xi = High, X2 = Medium, X3 = Medium, X4 = 
Medium, and X5 = High. The exact prediction is Y = 7.38, i.e. Y g “VL” (Very 
Low). The following rule fired “If Xi=H & X2=M & X3=M & X4=M & Xs=H Then 
Y=VL with a probability 0.994 Or Y=L with probability 0.006 Or Y=M with 
probability 0.000 Or Y=H with probability 0.000 Or Y=VH with probability 0.000”. 



5. Conclusions 

An intelligent system is developed to assist decision-making in the design, planning 
or control of food extrusion process. It combines neural network with traditional 
expert system. The neural network captures process dynamics and provides an 
accurate mapping between process variables and output. To help engineers 
comprehend the process dynamics and make sensible decisions, the system also use 
symbolic rules to represent neural network knowledge. Sensitivity analysis of the 
system was also conducted but not presented in this paper due to the space limitation. 

This work is funded by the Kellogg Company and Indiana State University through 
a research grant. 



References 

1. Andrews, R., Diederich, J., and Tickle, A.B.: Survey and Critique of Techniques 
for Extracting Rules from Trained Neural Networks. Knowledge Based Systems. 
(1995) 1(2): 374-389. 

2. Durkin, J.: Expert Systems, Design and Development. Macmillan Publishing 
Company, New York. (1994) 

3. Hagan, M.T. and Menhaj, M.B.: Training Feedforward Networks with Marquardt 
Algorithm. IEEE Transactions on Neural Networks. (1994) 6: 989-993. 

4. Haykin, S.: Neural Networks, A Comprehensive Foundation. Macmillan College 
Publishing Company, New York. (1994) 

5. Mercier, F., Linko, P., and Harper, J.M.: Extrusion Cooking. AACC, St. Paul, 
Minnesota. (1989) 

6. Olkku, J. and Vainionpaa, J.: (1981), Response Surface Analysis of HIST 
Extrusion Texturized Starch-Protein-Sugar Paste. Journal of Food Processing 
Engineering. (1981) 821-826. 

7. Vainionpaa, J.: Modeling of Extrusion Cooking of Cereals Using Response 
Surface Methodology. Journal of Food Engineering, (1991) 1:13-26. 




Automatic Priority Assignment to E-mail 
Messages Based on Information Extraction 
and User’s Action History 



Takaaki Hasegawa and Hisashi Ohara 



NTT Cyber Space Laboratories 
1-1 Hikari-no-oka, Yokosuka-Shi, Kanagawa 239-0847, Japan 
{hasegawa , oharajOnttnly . isl . ntt .co.jp 



Abstract. A method for assigning a priority to each message received 
is proposed that automatically extracts the message’s features from the 
message and forms the personal profile of each user. It forms the profile 
by monitoring the messages sent and received by the user. The personal 
profile of the user consists of three features; the topics extracted from the 
message body, and the sender and the receiver (s) found in the message 
header. The proposed method processes the message body to extract 
the features of the type and urgency of the message. The priority of a 
message is calculated as the weighted sum of these features using the 
weights of each feature as determined by multiple-regression analysis. 
An experiment demonstrates that the proposed method can be put to 
practical use such as ranking or filtering the many messages received. 



1 Introduction 

As the Internet has become widely accepted by the general public, E-mail is 
becoming a more common communication tool. E-mail is convenient because the 
sender can pass information to many people at one time with virtually no time 
delay even over long distances. E-mail is slowly replacing the telephone, facsimile, 
and postal mail and the number of e-mail messages is increasing exponentially. A 
daily batch of e-mail can include all sorts of information ranging from important 
or urgent matters to junk or direct mail. If a user receives many e-mail messages 
at one time, there is a risk that he or she may overlook the important e-mail 
messages so it is necessary to prioritize all incoming e-mail messages. 

The most common way of assigning a priority to each e-mail message is for the 
sender to set the priority level in the e-mail’s header. Most e-mail handling tools 
allow messages to be filtered by using this kind of information. However, this 
forces the receiver to accept the sender’s priority. It is impossible to automatically 
rank incoming e-mail messages in terms of the receiver’s profile, such as job, 
interests, and so on. 

We propose to rank e-mail messages using the following three steps. The first 
step is to extract the key features of topic, sender/receiver (s), message type, and 
degree of urgency, from each incoming message. The second step is to monitor 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 573-582, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



574 Takaaki Hasegawa and Hisashi Ohara 



the user’s response to incoming messages or the creation of new messages in order 
to learn his/her interest in different topics and his/her relations with different 
senders and receivers. The third step is to calculate the priority of a newly 
arrived message based on the message’s contents extracted in the first step and 
the personal profile of the user created in the second step. 

This paper describes the problems of e-mail communication and previous 
approaches in section 2, the proposed method for assigning e-mail priority in 
section 3, experiment results in section 4, a discussion in section 5, and the 
conclusion in section 6. 

2 Problems of E-mail Communication 

An e-mail message consists of a header as an envelope and a body as content. 
The specifications of the header are defined in the international standard format 
of ARPA Internet text messages [1]. A header consists of tags that have special 
meaning. For example, the To: line holds the e-mail address of the receiver, the 
From: line gives the e-mail address of the sender, the Subject: line holds the title 
of the message, and the Date: line gives the date when the message was sent. 
With the exception of these four headers, extension headers starting with ”X-” 
can be freely defined. For example, the X-Priority: line is normally used by the 
sender to indicate the priority of the message. 

These header specifications yield several simple ways of filtering e-mail mes- 
sages [7,6]. Receivers can classify the incoming e-mail messages according to the 
sender or the subject. Moreover, receivers can display just those e-mail mes- 
sages that have high priorities as set by the sender. All of these approaches are 
flawed to some extent. The problem with using keywords is that the receiver 
does not know if his pre-defined keywords will be suitable or not. The sender’s 
priority often does not match that of the receiver. Because most senders use few 
words in the subject held, it is impossible to rank e-mail messages accurately 
using only this information. Accordingly, a personalized method for ranking or 
Altering messages is required. 

As one approach to learning the user’s preference, we note the site recom- 
mendation system for the WWW [3,4]. Such approaches use keywords issued 
by users or analyze the pages accessed by the user. However, in the case of e- 
mail, the only useful approach is to monitor the user’s actions in responding to 
incoming messages or issuing new messages. 

Maes proposed Email Agent [5] for e-mail personalization. It monitors the 
user’s actions and the features of situations, learns the user’s habits, and suggests 
or takes the nearest action the agent finds based on its memory and a calculation 
of the distances between a new situation and the situations in its memory. The 
distance metric used is the weighted sum of distances for the features. However, 
Maes’s approach is computationally expensive in learning the weights, because 
many features must be memorized to calculate the distance between situations. 
In addition, her approach uses only the headers of the messages and does not 
consider the contents of the messages. Accordingly, it fails to rank or Alter mes- 



Automatic Priority Assignment to E-mail Messages 575 



sages if the content holds important information. Accordingly, a method that 
considers not only the message’s header but also its body is required. 

Users who receive many e-mail messages daily often accumulate them in one 
folder (e.g. “inbox”) to buffer the incoming information. This is because they 
have little time to categorize messages into folders and it is difhcult for them 
to keep folders categorized continuously. These accumulated messages include 
tasks and schedules, so the folder is used as a personal archive. As the messages 
in the folder increase, managing them becomes more difficult. One system uses 
the functions of message marking and programmable reminders [8]. However, the 
system cannot mark important messages or place a reminder on them automati- 
cally. Therefore, automatically finding the important messages, i.e. the messages 
with high priority, is needed to reduce e-mail overload. 

3 Automatic Priority Assignment to Messages 

The user can assign a priority to an e-mail message very accurately because 
he/she knows the background of the message and can parse the message accu- 
rately. This process, however, is too time consuming. We consider that the user 
applies the following criteria in prioritizing messages: 

— from close associates, 

— to groups the user belongs to, 

— includes topics interesting to the user, 

— message type (including event schedules and to-do items), 

— and urgency based on event date or deadline. 

As an example, an e-mail message from the user’s boss with a to-do item 
with an urgent deadline takes precedence over a message from a friend with no 
deadline. The computer has great difficulty in recreating this process because it 
fails to understand context. We propose that the frequency of communication 
is the most useful indicator of the first three criteria. A list of known partners 
(sender /receiver (s)) or topics included in each message is created and a reply 
rate is determined for each partner or each topic. This measure is one that can 
be well handled by the computer. Message type is discerned by using templates. 
Templates are also used to extract dates and a simple comparison against the 
sent date provides a measure of urgency. 

3.1 Extracting Information from Messages 

The proposed method uses information extraction to find the existence of event 
schedules or to-do items, the event name, and event date or deadline from each 
message. Instead of morphological processing, keywords and character types are 
used to extract information from e-mail messages. Morphological processing can- 
not be applied to e-mail messages because most messages are very informal with 
many misspellings and grammatical lapses. The extraction module uses a so- 
phisticated template matching method [2] . For the experiments described in this 
paper, about one thousand Japanese templates were created (see Fig. 1). 



576 Takaaki Hasegawa and Hisashi Ohara 



(Event_pattern) 

(Inform_action_KW) 

(Time_KW) 

(Request.pattern) 

{Request^ction_KW) 

(DeadlineJCW) 



(Inform_action_KW) {Event name) (Time_KW) {Time) 
“have” I “hold” | “perform” | “carry out” | ... 

“at” I “on” I ... 

(Request_action_KW) (Deadline_KW) {Time) 

“reply” | “submit” | ... 

“by” I “not later than” | ... 



Fig. 1. Templates for information extraction (translated from Japanese) 



Table 1. Assigning the urgency degree from the difference between the sent date 
and deadline date or event date. Units of ‘h’/d’ and ‘w’ in the time difference 
line express, respectively, hour, day and week. Also ‘none’ represents no deadline 
or event date 



Time difference <3h <12h <ld <2d <3d <lw <2w <3w <4w >4w None 
Degree 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 



The message is classified in terms of the topics derived from the events ex- 
tracted from the message body and the subject field in the message header. To 
increase the usefulness of the proposed method, similar events are classified into 
the same topic using keywords within the event names. For example, if the event 
name is “a seminar on the Internet” , the word “seminar” in the event name also 
matches a typical keyword for the topic “lecture”, so the topic of the event is 
classified as “lecture”. If the event name contains different topics (e.g. A and 5), 
the event is classified as A, 5, and A and B. 

The type of a message is assigned to 1, if the message’s content includes at 
least one event schedule or one to-do item. Otherwise, the type is assigned to zero. 
The system is simplified by calculating the difference, i.e. the remaining time, 
between the sent date and the deadline date or the event date, if any. Table 1 
shows the corresponding urgency degree values. Messages with no deadline or 
event date are assigned the urgency degree of zero. The highest urgency degree 
value is 1. For example, if the time difference is within 2 days, the urgency degree 
is 0.7. The time in the Date: line of the e-mail header is taken as the sent date. 



3.2 Monitoring Receiver’s Action History 

The priority of an e-mail message should be assigned so as to match the user’s 
personal preferences. Rather than forcing the user to answer many questions, the 
method observes the replies the user makes to the messages he/she receives and 
the new messages created by the user. This is done in the monitoring module. 
Our assumption is that sending a message (either a reply or a new message) 
indicates that the user places some priority on two items: the topics and the 
partners. 



Automatic Priority Assignment to E-mail Messages 577 



Reply rates to 
partners (Senders) 



Reply rates to partners 
(Receiyers) 



Reply rates to topics 



Replied/ 

Newly sent 



Receiyed 



c 






Receiyed 



Ratio 





of pri( 


Ettion 

□rity 


Extr 


action 



Extraction 
Urgency 
Within 3 hours 
Within 12 hours 

Over 4 weeks 

Message type 
Information of event schedules 
To-do items to the user 
Others 



Monitoring action history Incoming messages 



Fig. 2. Architecture for assigning a priority to each incoming message 



Most users have particular interests and rank some topics more highly than 
others. The monitoring module assesses this preference by calculating the reply 
rate for groups of topics. For each topic, the module calculates the ratio of 
messages containing the topic that were replied to or created. For examples, if 
incoming and reply messages of the topic “lecture” are, respectively, 48 and 36, 
the reply rate of the topic “lecture” is 0.75. To allow the module to handle new 
messages and replies in the same way, a new message is treated as a reply to a 
message with the same topic. 

In a similar way, the monitoring module determines the reply rate for each 
partner. If the message has an unknown partner, a new entry is created in the 
reply rate table. If the user broadcasts a new message to several partners, each 
partner is regarded as having sent a message to which the user is replying. 



3.3 Calculation of Priority 

A calculation module is proposed that calculates the priority of an arriving e- 
mail message in cooperation with the extraction module and the monitoring 
module. Fig. 2 shows the complete architecture of the proposed method. Bars in 
the figure represent the ratio for reply rates of partners (sender/receiver(s)) and 
topics. 




578 Takaaki Hasegawa and Hisashi Ohara 



The extraction module assigns the message type and calculates the degree 
of urgency of the message from the difference between the message sent time 
and the deadline or the event date within the e-mail message. The monitoring 
module determines the highest of the reply rates of the topics included in the 
message and the reply rates for the partners of the message. The calculation 
module computes the priority of a new incoming e-mail message by using the 
reply rates, the message type, and the degree of urgency. The priority of the 
message, P, is calculated from the value of each feature, Vi^ according to the 
following equation: 

P = a + '^Wi*Vi (1) 

i 

where a is a constant and Wi is a weight that can be set to reflect personal 
preferences. 



3.4 Determining the Weights of Features 



To determine the weights of features we start by manually ranking a set of N 
messages from 1 (highest priority) to N (lowest priority). We calculate the pri- 
ority of a manually ranked message, P', using the following equation: 



P' 




( 2 ) 



where O is the rank of the message. If the priority of a message is P', the 
following equation holds true: 



P' = a + ^ Wi * Vi 

i 



(3) 



where a is a constant and Wi is a weight that can be set to reflect personal 
preferences. In this equation, Wi is the only variable and can be calculated by 
multiple-regression analysis. 

4 Experiment 

An experiment was conducted to check the validity of the proposed method. 2113 
messages sent to and 590 messages sent by the author were used as the training 
data. 250 of these messages (analysis set) were manually ranked and the weights 
of features were calculated by multiple-regression analysis. Another 236 mes- 
sages received by the author were prepared as the test set and manually ranked. 
The priority of each of these messages was automatically assigned and the cor- 
responding rank determined. The priorities of the messages in the analysis set 
were also automatically assigned. 149 topics were used in the experiment and 
each topic had a mean of 5.8 keywords. Urgency degree was assigned following 
Table 1. 



Automatic Priority Assignment to E-mail Messages 



579 



Table 2. Weights determined by multiple-regression analysis 



Features 


Weights 


Constant 


0.2593 


Topic 


-0.0549 


Partner (Sender) 


0.3477 


Partner (Receiver (s)) 


0.2768 


Message type 


0.1713 


Urgency degree 


0.4431 




Ranking differences Ranking differences 

(a)Analysis set (b)Test set 



Fig. 3. Histograms of ranking differences between the manually ranked data sets 
and the automatically ranked data sets 



Table 2 shows the weights of features decided by multiple-regression analysis. 
The messages of the data sets (the analysis set and the test set) were automat- 
ically ranked according to the priority assigned to each message by using the 
weights. Fig. 3 shows the histograms of ranking differences between the man- 
ually ranked data sets and the automatically ranked data sets. Table 3 shows 
the performance of the proposed method in automatically ranking and filter- 
ing important messages (those that have a priority over 0.6). As the evaluation 
measurement, we used the mean value and the standard deviation of ranking 
differences between the results of manual and automatic ranking. These tables 
and histograms reveal that the proposed method offers basically the same per- 
formance as manual ranking for the test set. That is, the proposed method based 
on information extraction from messages, the reply rates of partners and topics 
and the weights determined by multiple-regression analysis is valid. 

In addition, we compared the proposed method to the common method of 
filtering using one term. As the example we assumed that the user would feel 
that messages from his boss might have the highest priority and would filter the 
messages using just the boss’s address. The test set (236 messages) contained 66 
messages from the author’s superiors. We determined how many of the messages 



580 Takaaki Hasegawa and Hisashi Ohara 



Table 3. Evaluation of automatic ranking and filtering 





Ranking difference 


Filtering 


Data set 


Mean value 


Standard deviation 


Recall 


Precision 


Analysis set (250) 


41.2 


34.9 


67.0 % (67/100) 


78.8 % (67/85) 


Test set (236) 


33.1 


29.1 


80.9 % (76/94) 


76.8 % (76/99) 



Table 4. Comparison between common method and proposed method 





Common method 


Proposed method 


Messages 


Hit 


Ratio 


Hit 


Ratio 


66 


24 


36.4 % 


45 


68.2 % 



in the filtered set (66 messages) and the top 66 entries of the automatically 
ranked set were present in the top 66 manually ranked messages. As the evalu- 
ation measurement we used R-Precision because this metric has only one value 
and allows methods to be compared intuitively. Tabled shows the results. Simple 
filtering found only 24 of the 66 most important messages while the proposed 
method found 45. This comparison reveals that our proposed method is superior 
to simple filtering. 

Table 5 shows the recall and precision for the extraction of topics, message 
type, and urgency. Urgency represents the value calculated from event dates 
and deadlines. The weights were calculated from the training set and so contain 
extraction errors. Even so. Tabled shows that the proposed method is superior 
to simple filtering. 

5 Discussion 

The above results demonstrate that the proposed method can be put into prac- 
tical use for ranking or filtering the many messages received. Its excellent per- 
formance is due to its use of the information in the message body. As shown 
in Table 2, the weight for urgency is the highest of the weights determined by 
multiple-regression analysis. Urgency is obviously one of the most important at- 
tributes and information on this attribute is most often found in the body, not 
the header. The value of the weight for urgency reveals this information is criti- 
cal in achieving a practical level of performance. The information of the partner 
was also found to be important. The idea that frequent communication from or 
to people should indicate high priority is supported. 

Inversely, the weight for the topic is small. The information of topics does 
not contribute much to system performance because the current precision of in- 
formation extraction about topics is not high. Keyword matching by only char- 
acters fails to extract correct topics when some of the characters are Katakana^ 



Automatic Priority Assignment to E-mail Messages 581 



Table 5. Recall and precision for information extraction 



Features 



Analysis set 
Test set 
Sum 



Topics 

Recall Precision 


Message type 
Recall Precision 


91.6 % 91.5 % 

(837/914) (837/915) 

90.2 % 84.1 % 

(754/836) (754/897) 

90.9 % 87.8 % 

(1591/1750) (1591/1812) 


96.5 % 98.6% 

(136/141) (136/138) 
99.3 % 100% 

(148/149) (148/148) 
97.9 % 99.3 % 

(284/290) (284/286) 



Urgency 

Recall Precision 
87.9% 100% 

(58/66) (58/58) 

94.9% 100% 

(74/78) (74/74) 

91.7 % 100 % 

(132/144) (132/132) 



Japanese letters used to represent a word of foreign origin. Improving the pre- 
cision of information extraction about topics would yield more optimal topic 
weighting. The message type also does not contribute much to system perfor- 
mance. We consider that this is because a higher level of analysis is needed. For 
example, the sentence “Please indicate the desired dates if you have not already 
done so.” does not include a to-do item if the receiver has already informed the 
sender. It is difficult to correctly acquire the user’s state. 

Another benefit of the proposed method is determination of the weights of 
features. On the whole, the histograms of the analysis set and the test set show 
the same tendency (see Fig. 3). That is, the reply rates of partners and topics and 
the weights determined by multiple-regression analysis are effective for realizing 
personalization. This result reveals that if some of the archived messages received 
and sent by the user are ranked manually and the weights determined, these 
weights would be useful in realizing automatic message ranking that reflects 
the user’s opinion. We consider that the weights will remain valid for the user 
until his/her circumstance change dramatically. Daily transitions in the degree 
of interests and the relations of partners can be absorbed by the reply rates. 
Namely, the proposed method saves time by eliminating the need to recalculate 
the weights. 

We note that information extraction errors with regard to the type and the 
urgency of messages degrade the accuracy of the weights and so lower system 
performance. For example, the histogram in Fig. 3(a) indicates that in terms 
of ranking and filtering messages, the proposed method did not process the 
messages in the analysis set as well as the messages in the test set even though the 
weights were calculated by applying multiple-regression analysis to the analysis 
set. We checked this effect more closely and found that the analysis set contained 
more messages that showed high levels of urgency but that were inaccurately 
assessed by the proposed method. Obviously, improving recall and precision 
for information extraction is an important problem. Decreasing the number of 
information extraction errors would make the weights more accurate. 

Our future work is to help in ranking the messages manually by allowing the 
user to directly manipulate the messages. Direct manipulation could be used to 
modify the weights over time. Another future work is to examine the contribution 



582 Takaaki Hasegawa and Hisashi Ohara 



of other message features to priority. The proposed method makes it is easy to 
add features because it uses multiple-regression analysis. In addition, more topics 
must be prepared that work effectively for various people. 

6 Conclusion 

A method of automatically assigning a personalized priority to each arriving 
message that uses information extraction and the user’s action in replying to 
incoming messages or sending new messages was proposed. The priority is calcu- 
lated as the weighted sum of five features: the message type, the urgency and the 
topics as extracted from the message body and the partners (sender/receiver(s)) 
as found in the message header. The weights are calculated by applying multiple- 
regression analysis to a set of manually ranked messages. The proposed method 
allows e-mail messages to be ranked automatically according to the content of 
each message and the user’s preferences as shown by his/her past history. 



References 

1. Crocker, D.: Standard for the Format of ARPA Internet Text Messages, STDIl, 
RFC822, UDEL, 1982. 574 

2. Hasegawa, T. and Takagi, S.: Extraction of Schedules and To-Do Items from E- 
mail Messages by Identifying Message Structures and Using Language Expressions 
(in Japanese), IPSJ Transaction^ Vol.40, No. 10, pp. 3694-3705, 1999. 575 

3. Levy, A. Y., Rajaraman, A., and Ordille, J. J.: Query- Answering Algorithms for 
Information Agents, In Proceedings of the Thirteenth National Conference on Ar- 
tificial Intelligence {AAAI-96)^ pp. 40-47, 1996. 574 

4. Lieberman, H.: Letizia: An Agent that Assists Web Browsing, In Proceedings of 
the Fourteenth International Joint Conference on Artificial Intelligence {IJCAI- 
95), pp.924-929, 1995. 574 

5. Maes, P.: Agents that Reduce Work and Information Overload, Communications 
of the ACM, Vol.37, No.7, pp.31-40, 1994. 574 

6. Microsoft Corporation: Microsoft Outlook Express Reviewers Guide, 
http:/ /www.microsoft.com/ie/ie40/oe/oepress-f.htm 574 

7. Netscape Communications Corporation: Guide to What’s New: Netscape Commu- 
nicator, 

http: / / www.netscape.com / comprod / products / communicator /index. html 574 

8. Whittaker, S. and Sidner, C.: Email Overload: Exploring Personal Information 
Management of Email, In Proceedings of the Conference on Human Factors in 
Computing Systems (CHP96), pp. 276-283, 1996. 575 



Information Extraction for Validation of Software 
Documentation 



Patricia Lutsky 
Arbortext, Inc. 

1000 Victors Way, Ann Arbor, MI 48108 USA 
plutsky@arbortext . com 



Information extraction techniques ean be used to improve the quality of 
software user manuals and online help systems. These doeuments are often 
formatted as repeated seetions that have similar heading strueture, with free-text 
inside eaeh seetion. XML (extensible markup language) enables doeument 
designers to design rieh tag sets where tags for seetion headings eontain 
information about eaeh seetion. This eontextual information, eoupled with the 
faet that the free-text portions of the doeuments use a limited sublanguage, 
mean that simple natural-language-based teehniques ean be used to extraet faets 
from online doeuments. The SIFT doeument parser system has demonstrated 
the potential for this type of extraetion in the area of software doeument 
validation. 



1 Introduction 

User documents such as reference manuals and online help systems are central 
components of software systems. It is important that this documentation is accurate so 
that it will assist, rather than hinder, a user that is trying to understand the system. 
While there are various CASE (computer-aided software engineering) tools available 
to assist in generating and maintaining software tests, the potential for building tools 
to assist in checking the documentation has not been exploited. Tools such as spelling 
checkers and grammar checkers can be used by technical writers for the initial 
development of manuals. But since software systems change significantly over time, 
if the documentation is not rigorously updated with each new version of the system, it 
will develop regressions. 

Often the major goal of writing groups is to describe the new features of a system, 
with verification of existing systems left as a time-available activity. Some 
verification can be done mechanically, such as checking that argument lists match the 
system header files, or extracting example source code fragments from the 
documentation to be sure the examples still work properly. However, information 
extraction from the natural language text of the documents can greatly expand the 
potential areas that can be verified. While complete understanding of software 
manuals is still a research goal, currently available natural-language-processing 
technology can be used to extract specific facts from structured documents. The 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 583-590, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




584 Patricia Lutsky 



increasing use of structured markup such as XML enables domain-specific 
information extraction from texts. Even if the entire task of verification can not be 
automated, information extraction can facilitate the process. 

I have built a proof-of-concept validation tool, SIFT, that can extract information 
from online documents and use that information for validation. SIFT is successful 
because it looks for specific types of information. It does not try for full-text 
understanding, thereby bypassing issues of discourse structure. It works on individual 
sentences, so only concepts that can be expressed in individual sentences can be 
extracted. However, many important verifiable concepts are expressed this way, such 
as required privileges for operations, parameter interdependencies, and allowable 
input value ranges or return codes for interfaces. 



2 Information Extraction from Structured Documents 

Information extraction concerns extracting specific facts from documents. [I] Unlike 
automated indexing, the precise semantic content of sentences must be understood to 
extract facts. In online reference documents, there are often specific types of facts 
that could be used for validation if they could be extracted from the documents. 

Software reference manuals are increasingly available online in Extensible Markup 
Language (XML). These reference documents are often semi-formatted; they are 
organized into specific repeated sections for each entity being described with free-text 
in each section. XML is a simple dialect of SGML (standard generalized markup 
language) that has been developed by a working group of the World Wide Web 
Consortium. It has been designed for ease of implementation and for interoperability 
with both SGML and HTML. [2] A key advantage of XML over HTML is that XML 
allows users to create custom tags. Also, it separates content information from 
presentation formats. Major vendors are providing support for XML in current or 
future versions of their web products. 

XML provides a mechanism, the document type declaration, for defining 
constraints on the logical structure of a document. Within the definition is the markup 
declaration, which declares the type of tags available for a class of documents. Tags 
can have attributes associated with them, where domain-specific information can be 
specified in the markup. For instance, in an online help system, there can be a 
<fimction> tag for the headings of routine descriptions and a <parameter> tag for the 
description of each argument of a routine. Further, the <parameter> tag can have 
attributes such as whether the parameter is optional or required. The processing of 
free text inside these tags can benefit from the contextual knowledge of which 
routine, or even which parameter of which routine, the text describes. 

In addition to using the header tag information for the context of sentence 
processing, parsing of online reference manuals is also simplified because the text 
inside the free-text sections uses a restricted sublanguage [3] [4]. This allows simple 
domain-specific parsing techniques to be effective. A sublanguage is a semantically 
constrained version of a natural language spoken by a particular group of people. A 
sublanguage is not a subset of a natural language, but rather has its own grammar that 
reflects the way the group of people communicates. 




Information Extraction for Validation of Software Documentation 585 



Further, the style of software reference manuals facilitates their automatic 
processing. Linguistic constructs such as puns and fanciful metaphors are not used, 
and the tone is simple and uniform. Specific concepts tend to be described the same 
way throughout a document. Even if writing groups do not explicitly use a style 
manual, documents used in software engineering will have consistent style because as 
writers add sections to documents they try to match the style of existing sections. 



3 SIFT 

SIFT [5], which stands for "specification information from text," is a natural- 
language-based information extraction tool that can extract information from the free- 
text portion of semi-formatted documents. It uses simple techniques for information 
extraction from texts, yet has been able to improve tests for the domains of Open VMS 
operating system testing [5] and XCON expert system database testing [6] [7]. 

Figure 1 shows the architecture of SIFT. The input to the system is software 
documentation; the output is a canonical form of test information that can be 
transformed into additions to a test system. These additions will take different forms 
depending on the test system. SIFT contains four modules. These include both 
domain-independent components (a testing knowledge module and a general purpose 
parser) and domain-dependent components (a sublanguage grammar and a domain 
model). The domain-independent modules are robust enough to work for multiple 
domains so that a tester wanting to use the SIFT document parser in a new domain 



Structured 

documents 






Domain Independent 



Testing Knowledge 



Parser 



Domain Dependent 

Sublanguage grammar 



Domain model 






Canonical 
Form of 
Testing 
Information 



Fig. 1. Architecture of SIFT 



only has to develop the domain model and sublanguage grammar based on the 
specific way that information is expressed in the documents. 







586 Patricia Lutsky 



3.1 Input and Output 

The input to SIFT is a structured online document. SIFT is most valuable for 
documents that have a long lifespan, where the system being described will be 
continually adding new features. That way, the grammar can be written once, and 
then run repeatedly on the document as it changes with new versions. 

The output of the SIFT document parser is a canonical form of the relevant 
information that can be easily translated into the correct format for input to a test 
system. For example, if allowable limits on an integer parameter are being extracted, 
the sentence 

The maximum value you can specify with the BUFQUO 
argument is 65355. 

is translated to a canonical form such as 

The maximum value for BUFQUO is 65355. 

and an encoded canonical form such as 

(maximum_value BUFQUO 65355) 

that can be mechanically translated into the format used by the verification system. 
The required output format is described as part of the sublanguage grammar for the 
domain. 

3.2 Sentence Types 

In SIFT, a sentence type is a group of sentences of a sublanguage that are structurally 
the same. For example, the following sentences are of the same type: 

The box is on the counter 
The pen is on a box 

The shoe is under the counter 

A sentence type can be identified with a grammar as the following description of the 
three example sentences: 

NP is PREP NP 

Components of the grammar may be domain-specific as in the following sentence 
type description: 

PRIVILEGE is required to VERB NP 

where PRIVILEGE is a noun phrase describing an OpenVMS [8] operating system 
privilege. Two corresponding sentences are 

SYSNAM privilege is required to specify executive or 
kernel mode access for a logical name table. 

SYSPRV is required to specify the system directory 
table . 




Information Extraction for Validation of Software Documentation 587 



3.3 SIFT Modules 

The domain-independent general-purpose parser has four parsing steps, and the 
sublanguage grammar parts correspond to them. The four parts of the grammar are: 

- preprocessing directives 

- phrase structure grammar 

- heuristics for identifying sentence types 

- heuristics for extracting semantic information. 

The grammar only covers the structures of the sentence types that are used to convey 
testable information in the target document. 

The corresponding parser is based on an ATN (augmented transition net) [9]. It 
looks at each appropriate section of the document and its contextual information. 
Within each section, it works on a sentence-by-sentence basis, although a ’’sentence" 
of the sublanguage may only be a sentence fragment of English. If it is able to parse a 
sentence and derive a semantic representation for it, it returns the corresponding 
semantic expression. If not, it simply ignores the sentence and moves on to the next 
one. 

The phrase structure output is a parse tree. Its format is domain specific and may 
contain semantic information as well as syntactic groupings. This tree is passed to 
two types of heuristics: one set of heuristics identifies the sentence type based on the 
phrase structure description of the sentence and the other set of heuristics converts the 
semantic contents of the sentence to a domain-specific useful form. 

The second set of heuristics takes advantage of the XML-encoded knowledge in 
the document for semantic processing. This knowledge can be used in reference 
resolution and in determination of the correct values for the argument structure of 
lexical items. It can also help with missing references in informal documents, such as 
if the sentence fragment 

Only relevant for software. 

occurs in a specification document. The missing sentence subject is obtained from 
the heading tag of the section for the attribute that is being described. 

The domain model is based on a linguistic formalism, the generative lexicon [10]. 
By using a linguistic formalism, the language-related information is combined with 
general semantics of domain model entries. 

The other domain-independent module is the testing knowledge module. It 
contains general knowledge about testing and common test case scenarios. Currently 
it is used only when translating canonical sentences into the correct format for 
additions to the verification system. In the future its role might be expanded to assist 
further in SIFT’s processing of the document. 

The SIFT system was written in Common FISP and run on Microsoft Windows. 
The phrase structure grammar, lexicon, and domain model are all separate FISP 
structures. The grammar heuristics for identifying sentence types and for extracting 
semantic information from parse trees are encoded as FISP functions that get invoked 
from SIFT. 




588 



Patricia Lutsky 



4 Preliminary Results 

The SIFT system has already demonstrated its potential in automating the generation 
of software tests [5]. To show its ability to verify the correctness of a document, the 
online help system for the Adept series of structured editors [11] was evaluated. In 
particular, the description of a large section of the system, the repository API, was 
analyzed. This is a set of routines for accessing document and document fragments 
that are stored in an external repository. 

A developer had noticed that one of the routines had an incorrect return code listed 
on an error condition, and there was a concern that other return codes might also be 
incorrect. The SIFT system was used to extract all sentences describing return codes 
from repository routines so they could be verified for accuracy. 

The SIFT grammar was written based on the first 1 5 routine descriptions, and then 
run on the remaining 46 routines. All of the sentences had the basic form 

If the operation fails, the function returns a -1. 
however, some of the sentences had additional phrases, such as 

If the operation fails, the function sets $ERROR and 

returns NULL, 
or 

If dobj is invalid, or the operation fails, the 

function returns null_oid. 

Therefore, the sentence type grammar closely matches the first example sentence, 
with several optional grammar clauses. 

At first, the SIFT grammar covered the sentences about error return codes from 2 1 
of the remaining 46 descriptions. However, with a minor change to the grammar, 
SIFT processing was enhanced to enable retrieval of information about 38 of the 46 
routines. The remaining 8 routines would have required a major change to the 
grammar, so they were just checked separately. Once extracted, it was a simple task 
to verify that the documented return codes matched those actually returned by the 
software routines. 

This shows the potential for using information extraction in document checking. 
Additional parts of the documentation will be analyzed for further types of facts that 
would be useful for validation. 

5 Related Work 

The growth of the world wide web has increased the number of documents that are 
available in online structured form tremendously, and there has been considerable 
interest in extracting information from these documents. [12] However, while some 
researchers have included natural language processing in automated software 
engineering, the potential for using text-based systems in software engineering has 
not been fully explored. Projects such as that of Maarek and Berry [13] and Goldin 
and Berry [14] work on requirement documents, in particular to locate major 
concepts. The information extracted from the documents is used to better understand 
and structure the requirements. 




Information Extraction for Validation of Software Documentation 589 



The most similar system to SIFT is KITSS, developed at AT&T Bell 
Laboratories [15]. It also uses natural language parsing techniques on existing texts 
concerning testing. For KITSS, the texts are informal English descriptions of tests 
that have been performed on telephony systems. The task is to translate these 
descriptions into formal, executable scripts. KITSS encounters similar problems as 
SIFT does in handling the sublanguage of existing texts, although their documents 
explicitly describe tests. However, while the domain is similar, the tasks differs in 
that their output must meet very specific standards. Also, while SIFT’s task is to find 
explicit facts in a document, where many other facts might go unprocessed, KITSS’s 
task is to completely translate each test script into a test procedure. 

Few commercial CASE products include natural language processing. The SoDA 
tool from Rational, Inc [16] incorporates documents into Rational’s software 
engineering tool suite, although in the other direction: it extracts information from 
software development tools that is then used to generate software documentation. 

6 Conclusions 

SIFT is an effective tool, but there are requirements for the types of documents that 
SIFT can work on. SIFT's value is its ability to continue to extract facts from 
documents as they change over time. As such, SIFT only makes sense for documents 
that will be maintained for new versions of a software system. Additionally, the 
document must be semi-formatted: organized into sections with separate sections for 
each entity under test. The document sections must be identifiable to the 
preprocessor, such as is the case with section header tags in the XML format. The 
document must be written in simple, declarative, consistent style, and must use a 
sublanguage, preferably one where words have limited meanings. 

Valuable facts can be extracted from online documents encoded in XML. The 
domain-specific tags that XML enables allows documents to contain contextual 
information that facilitates information extraction from semi-formatted documents. 
And, since the free-text of the document is written in a restricted sublanguage, simple 
domain-oriented parsing techniques are sufficient for information extraction. If 
document validation is incorporated into the standard procedure of a technical writing 
group, stylistic conformance will increase. Also, rich domain-oriented XML tag sets 
can be developed that will ease information extraction from these documents; tag set 
design teams can include information extraction experts in order to insure the 
maximum usefulness of the document markup. 



References 

1. Cowie, J. & Lehnert, W. 1996. Information Extraction. Communications of the 
ACM 39.80-91 

2. World Wide Web Consortium 1998. Extensible Markup Language (XML) 1.0. 
W3C Recommendation lO-February-1998. http://www.w3.org/TR/REC-xml. 

3. Grishman, R. & Kittredge R. (Eds.). 1986. Analyzing language in restricted 
domains: Sublanguage description and processing . Hillsdale, NJ:Lawrence 
Erlbaum Associates. 




590 



Patricia Lutsky 



4. Kittredge, R., & Lehrberger, J. (Eds.). 1982. Sublanguage: Studies of language 
in restricted semantic domains . New York: Walter de Gruyter. 

5. Lutsky, P. 1998. Automating natural-language-based processes of software 
testing . Unpublished PhD dissertation, Brandeis University. 

6. Lutsky, P. 1995. Automating Testing by Reverse Engineering of Software 
Documentation. Proceedings of the Second Working Conference on Reverse 
Engineering . 

7. Schlimmer, J. 1991. Learning meta knowledge for database checking. 
Proceedings of AAAI 91 . 335-340. 

8. Digital Equipment Corporation 1988. QpenVMS System Services Reference 
Manual Version 5.0. 

9. Woods, W. 1970. Transition network grammars for natural language analysis. 
Computational Linguistics . 13 . 591-606. 

10. Pustejovsky, J. 1995. The Generative Lexicon . Cambridge, MA:MIT Press. 

11. Arbortext, Inc. 1999. Adept Online Help, Version 8.2. 

12. Knoblock, C. et al.(Eds.). 1998. Procedings of 1998 Workshop on AI and 
Information Integration . AAAI Press. 

13. Maarek, Y.S. & Berry, D.M. 1989. The use of lexical affinities in requirements 
extraction. Proceedings of the 1989 Conference on Software Specification and 
Design. 196-202. 

14. Goldin, L. & Berry, D. 1997. AbstLinder, A prototype natural language text 
abstraction finder for use in requirements. Automated Software Engineering. 4 , 
375-412. 

15. Kelly, V.E. & Jones, M.A. 1993. KITSS: A knowledge-based translation 
system for test scenarios. Proceedings of AAAI 93. 804-810. 

16. Rational Corporation 1999. Software documentation automation: a technical 
overview. http://www.rational.com/support/techpapers/tolOa.html. 




Object Orientation in Natural Language Processing 



Mostafa M. Aref 

Information & Computer Science Department 
King Fahd University of Petroleum & Minerals 
Dhahran, 31261, Saudi Arabia 
aref @kf upm . edu . sa 



Abstract. Natural Language Processing (NLP) has many applications such as 
Database user interfaces, Machine Translation, Knowledge Acquisition and 
Report Abstraction. Several approaches have been used in dealing with NLP. 
This paper describes an ongoing research project about understanding natural 
language text using object-oriented techniques. It starts with a brief introduction 
to the object-oriented paradigm and natural language processing. A description 
of the project, its different modules, and an example of text understanding are 
presented. The uses of object-oriented techniques in knowledge representation 
and morphological analysis are described. 



1 Introduction 

The main goal of natural language processing (NLP) is to get the computer to 
"understand" the input text. Understanding can be defined as transformation from one 
representation (the input text) to another (internal representation). This transformation 
involves different stages: morphological analysis, syntactic analysis, semantic 
analysis, and pragmatic analysis. At the end of these stages, a computer semantic 
interpretation of the sentences is produced. Then, different applications can use this 
interpretation such as: Machine translation, database interface, story understanding, 
and question-answering systems. Many factors contribute to the complexity of the 
understanding problem. The type of mapping is a major factor. Moreover, the 
complexity of the internal representation adds another obstacle. 

The first stage of NLP involves a parser (or syntactic analyzer) where a sequence 
of words is transformed into a structure that shows how these words are related to 
each other. A dictionary or lexicon is used directly to identify words in the input text. 
To minimize the size of the dictionary, only word stems are stored and a 
morphological analyzer is used to covert the input text words into dictionary words. 
Therefore, morphological analysis gains its importance by reducing the size of the 
dictionary. The second stage of NLP is a semantic analysis. In this stage a semantic 
representation of the input sentence's structure is constructed. Next, a discourse 
analysis looks at a group of sentences. In this stage, scope and reference problems are 
resolved. A response is generated based on the semantic representation of the 
discourse. These stages of NLP are not separated. Approaches based on sequential 
phases have shown their limitations [1]. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 591-600, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




592 Mostafa M. Aref 



Object-oriented techniques are being introduced in several applications. Originally, 
Object-oriented programming languages were developed for simulations of dynamic 
systems such as Simula [2] or for powerful human interfacing capabilities such as 
Smalltalk [3]. Object-oriented design is evolved via flowcharts, structured 
programming, user-defined types, abstract data types, and then objects and classes. In 
Artificial Intelligence, object-oriented representation is evolved via logic, rules, 
semantic nets, frames then objects and classes. The main features of object-oriented 
techniques are abstraction, encapsulation, inheritance, message -passing and dynamic 
binding. These features attract the researcher in the area of natural language 
processing. Object-oriented morphological analysis is introduced in [4]. Object- 
oriented natural language parsing is presented in [5]. Object-oriented knowledge 
representation is discussed in [6], [7], [8]. This paper presents a system of natural 
language understanding that based on object-oriented techniques. 

Section 2 describes the understanding system. Section 3 introduces the object- 
oriented knowledge representation. Section 4 discusses the requirements and the 
object-oriented approach for the morphological analyzer. The conclusions and 
directions of the current research are presented in Section 5. 



2 The Understanding System 

The understanding system accepts a text (a sentence or paragraph) as input. Then, the 
user may enter several questions concerning the input text. The system should answer 
questions that reflect the understanding of the input text. The system consists of the 
following 6 modules as shown in Figure 1. 

Knowledge Base: This is the main module of the system. It contains the English 
vocabulary and all the linguistic information related to this vocabulary. An object- 
oriented knowledge structure is used to represent this information. The object- 
oriented representation captures both the attributes and the behavior of the 
vocabulary. Other system modules access the knowledge base. If a word is not found 
in the vocabulary hierarchy, an error is shown to the user. For the initial 
implementation, this knowledge base contains a minimum set of the common English 
words (verbs, nouns, adjectives, adverbs, etc.) The information is stored as classes in 
a hierarchical representation. A description of the object-oriented knowledge 
representation is given in section 3. 

Semantic analyzer: Given a group of words and their morphological information, 
this module generates the objects (instances) of the classes of these words (nouns and 
verbs). Also, the semantic analyzer fills in the attributes value of the generated 
objects. The semantic analyzer only instantiates verbs and nouns. Adjectives, adverbs, 
pronouns, etc. are used in completing the object attributes. Filling these attributes 
follows syntactical and semantic rules that are expressed as a part of the vocabulary 
class hierarchy. If there is a violation of these rules, an error is shown to the user. 
Although the semantic analyzer is a module by itself, it actually represents the 
behavior of the verb class hierarchy. 

User interface: A user interface is needed to facilitate communication between the 
understanding system and the user. There is a text-window for input text, a query- 




Object Orientation in Natural Language Processing 593 



window for user questions, an answer-window, and an error-window to display error 
messages. Different types of errors are identified such as unrecognized words, 
grammar errors, semantic errors, unresolved ambiguities, and unclear questions. An 
answer to some of the questions might be "no information available to answer". 




Fig. 1. The object-oriented understanding system 



Morphological analyzer: Given the input text, the morphological analyzer coverts 
the text into groups of words in the basic form (verbs, nouns, adjectives, adverbs... 
etc.) together with their morphological information (type, tense,). The morphological 
analyzer separates the affixes (prefixes or suffixes) from the input tokens. A 
description of the object-oriented morphological analyzer is given in section 4. 
Discourse analyzer: Given the initial object representation of a set of sentences, the 
discourse analyzer resolves references between sentences. This is done based on the 
pronouns and prepositions within these sentences. The discourse analyzer represents 
the behavior description of the pronouns and prepositions. Polymorphism, one of the 
features of object-oriented representation, plays an essential rule in the discourse 
analyzer when different alternatives for resolving reference problems are considered. 
Query Language analyzer: The understanding system is tested using English 
questions. The query language analyzer parses these questions and accesses the 
object-representation of the input text, obtains and presents the answers to the user. 



2.1 Working Example 

To illustrate the working of the understanding system, an example of a text and a set 
of questions are given as shown in Figure 2. 





594 Mostafa M. Aref 



Input text: We bought a new desk. When it was delivered, we found that 
the drawer was broken. 

Questions: 1- What did we buy? 2- Who delivered the new desk? 

3- Did the new desk have a defect? 

Fig. 2. An example of input text and a set of questions 

The morphological analyzer identifies: 

• The verb stems: "buy'\ "deliver", "find", "be" and "break". 

• The nouns: "desk", and "drawer" . 

• The pronouns: "we", "it", and "that", and the adjective: "new". 

• The adverb: "when", the definite: "the", and the indefinite: "a" 

The semantic analyzer instantiates objects for verbs "buy", "deliver", "find", and 
"break" and for nouns "desk", and "drawer". The verbs "be" and "have" are not 
instantiated. They either play a grammatical role as auxiliary verbs or work similar to 
adjectives by setting values for noun attributes. Some attributes of these objects are 
set by the semantic analysis. These include agent(buyl) = we, object(buyl) = deskl, 
agent(fmdl) = we, and object(breakl) = drawer. Some attributes are set by the 
discourse analyzer such as object(deliverl) = deskl and part-of(drawerl) = deskl. 
Other attributes are not set at all such as agent(deliverl) or agent(breakl). 




The query analyzer answers the first question by checking the existence of the verb 
"buy" with agent "we". The answer would be "You bought a desk". For the second 
question, although "deliver 1" has the object "deskl", the agent attribute is empty. The 
answer would be "no information available to answer". The answer to the third 
question requires inferencing the answer from the generated object representation. 
Since "drawer" is the object of "break", the status of "drawer" should be set to defect. 
This would then be propagated to the status of "desk". 



3 Object-Oriented Knowledge Representation 

The objective of knowledge representation is to organize the information necessary to 
the application in such a way that it can easily be accessed and manipulated. The 







Object Orientation in Natural Language Processing 595 



knowledge content must be sufficient to solve problems in the domain and it must be 
efficient. There are different approaches to knowledge representation. Some of these 
representations are: logical representation, procedural representation, network 
representation, and structured representation. Object-oriented knowledge 
representation is a structured representation. 

Object-oriented knowledge representation organizes knowledge into classes of 
objects, subclasses and superclasses, which is an important issue in knowledge 
representation. By this organization, a class may inherit the properties of any of its 
superclasses and it may pass properties to any one of its subclasses. Inheritance 
provides several advantages to knowledge representation, which can be summarized 
as follows. 

• It provides a natural tool for representing structured knowledge bases. 

• It provides an economical way for common properties to a class of objects. 

• It guarantees that all members of a class inherit the appropriate properties. 

• It ensures consistency with the class definition. 

• It reduces the size of the knowledge base. 

• It can be used easily to implement default values and exceptions. 

The most important part of object-oriented knowledge representation is the abstract 
level, the ontology. The ontology is an explicit specification of some topic. It is a 
formal and declarative representation that includes the vocabulary for referring to the 
terms in a subject area and the logical statements that describe what the terms are, 
how they are related to each other, and how they can or cannot be related to each 
other. Ontologies therefore provide a vocabulary for representing and communicating 
knowledge about some topic and a set of relationships that hold among the terms in 
that vocabulary [9]. Several research works are concerned with ontology, such as 
"Ontolingua" [9], and the "Generalized Upper Model" [10]. 



3.1 Object-Oriented Lexicon 

A lexicon is a dictionary where the linguistic information is associated with each 
word. Many words are ambiguous, in the sense that they have more than one meaning 
or more than one set of grammatical properties. In addition, words may be related 
semantically to each other. These are three basic semantic relations [11]: 

• A synonym of a lexeme is a similar one; where strong synonyms are 
equivalent lexemes and weak ones are subject to some meaning changes. 

• A hypernym of a lexeme is a more general meaning. 

• A hyponym of a lexeme is a more specific meaning (opposite to hypernym). 

An object-oriented representation is used to accommodate the grammatical 

properties and these semantic relations. In this representation, a lexeme is represented 
as an object (class). The hyponyms of a lexeme are represented as subclasses. The 
hypernym of several lexemes is represented as a superclass of their classes. Within 
each lexeme class, grammatical properties and synonym lexemes are presented as 
attributes. The linguistic information is inherited to the subclasses and may be 
overridden by specific information for that lexeme. Figure 4 shows part of the 
hierarchy of an object-oriented lexicon that includes affixes. 




596 Mostafa M. Aref 




4 Morphological Analysis 

Morphological analysis concerns the word level where individual words are analyzed 
into their components, and non-word tokens, such as punctuation, are separated from 
the words. The first step is to break a sentence into tokens. Then, these tokens are 
analyzed into their components: prefixes, suffixes, and word stems. The words stems 
are the basic form of words that have been stored in the lexicon. Non-word tokens are 
separated from the words. The lexicon entry, called a lexeme, consists of a word stem 
and its linguistic information (grammatical and semantic properties). Word stems are 
checked for existence in the lexicon and their linguistic information is determined. 
The affixes (prefixes and suffixes) are used to change this linguistic information to 
determine the linguistic properties of the sentence tokens. Figure 5 shows examples of 
tokens and some grammatical information change produced by the affixes. 





Object Orientation in Natural Language Processing 597 



re (prefix) 
initial (stem, noun)” 

ize (suffix) 

tion (suffix) 



initialize (verb) ' 



rreinitialize (verb) 



r reinitialization (noun) 



Fig. 5. Lexeme linguistic information change by the affixes 



The morphology analysis is needed for several reasons [12]. It reduces the size of 
dictionaries in which only word stems are stored. It eases data entry to the lexicon 
because not all forms of the word are needed. New words (neologisms) may be 
generated and checked following the morphology rules. To achieve these advantages 
several requirements of morphological analysis may be expressed as follow: 

• It should recognize the normal words such as "read", "write", "June" or "Bill." 

• It should simplify regular words such as "reading" into the basic verb "read" 
with the suffix "ing" or "disadvantage" into the basic adjective "advantage" with 
the prefix "dis." 

• It should determine the effect of the affixes on the word linguistic information. 

• It should simplify semi-regular words such as "writing" into the basic verb 
"write" with the suffix "ing" or "irregular" into the basic adjective "regular" with 
the prefix "in." 

• It should consider the possibility of more than one affix at the same time such as 
"reinitialization", where the stem is "initial" with one prefix "re" and two 
suffixes "ize" and "tion." 

• It should recognize the irregular words such as "went" and "mice" find their 
stems "go" and "mouse" respectively. 

• It should avoid the segmentation of stems that have some affixes such as the 
suffix "ing" in, the prefix "re" in "read", and "as", "ass", "nation" and "sin" in 
"assassination." 



4.1 Difficulties of Morphological Analysis 

There are two kinds of morphology: inflection morphology where affixes are added to 
a stem, and derivational morphology where the formation of new words from existing 
stems is done. Inflectional morphology has the following characteristics [12]: 

• It is systematic: adding an affix to a stem has the same grammatical or semantic 
effect on other stems (e.g. making a noun plural by adding "s" at the end). 

• It is productive: the new words follow the language rules. 

• It preserves the category: the broad grammatical category of the word is not 
altered by inflection process (i.e. verbs remain verbs and nouns remain nouns). 

Inflection morphology may be subdivided further into [13]: verbal grammatical 
function change (active, passive), verbal tense (present, past), verbal agreement (verb, 
subject & object), and nominal and adjectival (number, case, gender). Derivational 
morphology does not have the above characteristics. Therefore, it is unsystematic, 
partly productive, and category alternate. Derivational morphology may be 
subdivided further into: de verbal nominal, de verbal adjectival, deadjectival denominal 
and prefixal derivational. It is often difficult to draw an exact borderline between 
inflection and derivational morphology [14]. 





598 



Mostafa M. Aref 



4.2 The Object-Oriented Approach 

Morphological analysis, as well as Natural Language Processing, raises the problem 
of ambiguities and the resulting multiple solutions. Approaches based on sequential 
phases have shown their limitations [1]. This is due to the lack of a real exchange of 
linguistic information between different phases that are needed to reduce ambiguities. 
In this section, an object-oriented algorithm for morphological analysis is presented. 
This algorithm accesses a lexicon to check the existence of word stems and to obtain 
related linguistic information. This lexicon is built as a hierarchy of the classes. 

Affixes are divided into three subclasses: prefix, infix, and suffix. Each class of 
these subclasses is subdivided into subclasses based on the size of the affix (number 
of letters). Instances of these affixes are represented by two levels. The first level 
contains the regular forms these affixes. The second level contains their irregular 
forms. Examples of the former (three letters) are "ing", "ful", and "est". Examples of 
the irregular forms of the prefix "in" are "ir", "il", and "im". This representation 
provides the following advantages: 

• The classification of stems and affixes semantically [6]. 

• The linguistic information is distributed over several levels. Subclasses inherit 
the common attributes and override specific ones. 

• The morphological analysis routines are distributed over the levels as methods. 

4.3 The Object-Oriented Algorithm 

One of the approaches to morphological processing is known as the "two-level 
model" [14]. In this approach, two levels of representation are needed to describe the 
morphology of a language. These two levels are called the "lexical level" and the 
"surface level". When an affix is added to a stem, the result is not always just a 
concatenation of the two. Often additional processes such as reduplication, insertion, 
deletion, or umlauting of a character may occur. In the two-level morphology model, 
these processes are referred to as two-level morphology rules. 

In the object-oriented algorithm, the input is the surface level of the word. This is 
taken by tokenizing the input statement. The output is the lexical level of the word. 
This consists of the lexeme (the word stem with its linguistic information) and all 
affixes added to the stem. The algorithm itself consists of methods as a part of the 
affix classes. In every class in the affix class hierarchy, there is a method dealing with 
a particular part of the morphological analysis. There are five levels in the affix class 
hierarchy (i.e. (affix), (prefix^ infix or suffix), (number of letters), (specific affix (e.g. 
"m" or ’7^/")), and (irregular affix)). Therefore, there are five different types of 
message-handlers. Within these message-handlers, there are two functions: the "fi: 
affix effect function" and the "f 2 : irregular effect function". 

The "fi: affix effect function" describes the effect of adding certain affixes to a 
stem. This effect might change the grammatical category from a noun to a verb or 
from singular to plural, etc. The "f 2 : irregular effect function" describes the effect of 
adding an irregular affix to a stem as reduplication, insertion, deletion or umlauting of 
a character. This is described by the two-level rules. The "fi: affix effect function" is 
part of the affix classes (e.g. "un", "pre", or "ing"). The "f 2 : irregular effect function" 
is part of the irregular affix classes (e.g. "ir", "il" or "es"). 




Object Orientation in Natural Language Processing 599 



The description of the object-oriented algorithm is as follows. 

1. Level 1: (affix) Check that the token exists as a stem or as an irregular form 
stem. This is done through sending a message to the "word" subclasses. Then: 

i) If a class exists (whether for a stem or irregular stem), return the 
linguistic information of the stem. 

ii) Otherwise, send the token to the "affix" subclasses to check the 
existence of an affix. Return with the response. 

2. Level 2: (prefix, infix or suffix) Check the existence of a particular affix by 
passing a message to its class. If it does exist: 

i) Send a message to the "affix" class (Level 1) with the remainder of 
the token. 

ii) If it does not fail, apply the current "fl: affix effect function" on the 
linguistic information obtained from the "affix" class and return with 
the updated information. 

iii) Otherwise return failed. 

3. Level 3; (number of letters) Separate a specific affix length (number of 
letters) from the token. Send a message to the subclasses to determine the 
"fl: affix effect function". Then: 

i) If it does exist, return with the affix, the affix effect function, and 
the remainder of the token. 

ii) Otherwise return failed. 

4. Level 4: (specific affix (e.g. "in" or "ful")) 

i) If the affix matches then return with the current "effect function". 

ii) If not send to the irregular subclasses. If it does not fail, return with 
the updated remaining and current "effect function". 

iii) Otherwise return failed. 

5. Level 5: (irregular affix) Check the matching with the current irregular affix. 
If there is a match, apply the "f2: irregular effect function" on the remainder 
of the token and return the updated remainder. Otherwise return failed. 



5 Conclusion 

Natural language understanding may be considered as the mapping of a natural 
language text to an internal representation that captures the meaning of that text. This 
paper has presented an ongoing research project about an understanding system. 
Object-oriented knowledge representation is used as an internal representation for the 
understanding system. An object-oriented representation of lexicon has been 
described. This lexicon consists of a class hierarchy of the stems including the affixes 
class hierarchy. This representation uses the object-oriented inheritance to reduce the 
redundant linguistic information between word stems. An object-oriented 
morphological analyzer algorithm has been described. This algorithm utilizes 
message-handlers to distribute the two-level morphological rules among different 
affixes. The object-oriented morphological analyzer has been implemented, and then 
integrated into the understanding system. Implementation of the semantic analyzer is 
currently considered as part of ongoing research in this area. 




600 Mostafa M. Aref 



Acknowledgment 

The author wishes to acknowledge King Fahd University of Petroleum and Minerals 
(KFUPM) for the use of its various facilities in the preparation of this paper. 



References 

1. M. Stefanina & Y. Demazeau, "TALISMAN: A Multi-Agent System for Natural 
Language Processing," in Advances in Artificial Intelligence: 12^^ Brazilian 
Symposium on AI, J. Wainer & A. Carvalho Pfeirer (Eds), pp. 312-322 Springer- 
Verlag, Berlin-Heidelberg, 1995 

2. O. Dahl and K. Nygaard, "An Algol-based Simulation Language," 
Communications of the ACM, 9(9), 1966, pp. 671-678. 

3. A. Key, The Reactive Engine, Ph.D. Thesis, University of Utah, 1969. 

4. M. Aref "Object-Oriented Approach For Morphological Analysis," Proceedings of 
the 15^^ National Computer Conference, Dhahran, Saudi Arabia, pp. 5-11, 1997. 

5. Neuhaus, P. & Hahn, U., "Restricted Parallelism in Object-oriented Lexical 
Parsing," proc. of the 16^^ Int. Conf on Comp. Linguist., Copenhagen, UK. 1996. 

6. Aref, M., "A Bilingual Knowledge Representation," Proceedings of the 6th 
International Conference and Exhibition on Multi-lingual Computing, pp. 5.2.1- 
5.2.6, Cambridge, UK, 1998. 

7. Walczak, S., "Knowledge Acquisition and Knowledge Representation with Class: 
the Object-Oriented Paradigm," Expert Systems with Applications, 15, pp. 235- 
244. 1998. 

8. Jones, D. et al. Verb Classes and Alternations in Bangla, German, English, and 
Korean, AL Memo No. 1517, MIT, 1994. 

9. Gruber, T., "A Translation Approach To Portable Ontology Specifications," 
Knowledge Acquisition, 5(2): 199-220, 1993. 

10. Bateman, J. et al, "The Generalized upper Model Knowledge Base: Organization 
and use," The Conference on knowledge Representation and Sharing, Twente, 
Netherland, 1995 

11. Sproat, R., Morphology and Computation, MIT Press, Cambridge, Massa., 1992. 

12. Ritchie, G et al. Computational Morphology: Practical Mechanisms for the 
English Lexicon, MIT Press, Cambridge, Massachusetts, 1992. 

13. Hacken, P., "On the Definition of Inflection," in From Data to Knowledge, W. 
Gaul & D. Pfeirer (Eds), pp. 337-344, Springer- Verlag, Berlin-Heidelberg, 1996. 

14. Schiller, S. & Steffens, P. "Morphological Processing in the two-Level Paradigm," 
in Text Understanding in LILOG, O. Herzog & C. Rollinger (Editors), pp. 112- 
126, Springer- Verlag, New York, 1991. 




A Study of Order Based Genetic and 
Evolutionary Algorithms in Combinatorial 
Optimization Problems 



Miguel Rocha, Carla Vilela, and Jose Neves 

Departamento de Informatica Universidade do Minho 
Braga - Portugal 

{mrocha, j neves }@di .uminho .pt carla@labia01 . di.uminho.pt 



Abstract. In Genetic and Evolutionary Algorithms ( GEAs) one is faced 
with a given number of parameters, whose possible values are coded in a 
binary alphabet. With Order Based Representations (OBRs) the genetic 
information is kept by the order of the genes and not by its value. The 
application of OBRs to the Traveling Salesman Problem (TSP) is a well 
known technique to the GEA community. In this work one intends to 
show that this coding scheme can be used as an indirect representation, 
where the chromosome is the input for the decoder. The behavior of the 
GEA’s operators is compared under benchmarks taken from the Gomhi- 
natorial Optimization arena. 

Keywords: Genetic and Evolutionary Algorithms, Order Based Repre- 
sentations. 



1 Introduction 

For a considerable number of researchers, the term Genetic and Evolutionary 
Algorithm (GEA) is strongly related with the use of binary representations; 
i.e., the solution to a given problem is typically coded, from a 0/1 alphabet. 
In fact, this was the representation John Holland proposed in his pioneering 
work on the field[4], and its use has been supported by numerous studies. In 
terms of the schema theorem^ one can justify the binary alphabet by noticing 
that a minimal alphabet maximizes the number of hyper-plane partitions made 
available for the schema processing[12]. Furthermore, one can point that the use 
of an universal representation, in conjunction with simple operators, makes a 
domain independent approach, easier to implement and to address theoretically. 

However, some authors have referred advantages in the use of other kinds of 
representations. It has been argued that the use of alphabets that are closer to the 
problem’s data structures, allows for the definition of richer genetic operators [1]. 
This has been the case of real- valued representations, now being considered to 
be more efficient in numerical optimization [6]. When, back in the 1980’s, some 
researchers aimed at tackling the Traveling Salesman Problem (TSP) by using 
GEAs^ it became clear that the binary representations had serious difficulties 
when handling heavily constrained problems. In 1985, Goldberg and Lingle[2], 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 601—611, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



602 Miguel Rocha et al. 



proposed a different representation, the Order Based Representation ( OBR) that 
was based on the relative order of the genes in the chromosome. In this approach, 
each individual has a genotype that is made of a permutation of a set of values, 
given by a fixed alphabet. In the case of the TSP, the alphabet is the set of nodes 
of the particular instance being solved. The results obtained were substantially 
better, and they lead to numerous studies in this area. The main problem to be 
solved was the need to develop a whole new set of operators. Several researchers 
gave their contribution to this task, developing new operators, and evaluating 
their performance, namely in the TSP[11] [5]. Some of the newly defined operators 
were designed to work with general purpose OBR individuals, while others were 
designed with the TSP in mind. 

In this work, one argues that OBR is a feasible coding scheme, not only 
for the TS'P, but also as an indirect representation used in solving different 
combinatorial optimization problems, namely those of Scheduling^ Knapsacking 
and Graph Coloring. In this approach, an individual does not directly encode a 
solution, but instead it defines a strategy to reach one, so it depends on the work 
of a decoder, that takes the genotype and, by using a given heuristic procedure, 
arrives at the solution. Typically, the heuristic is a greedy method; i.e., it takes 
the genes in the given order, and at each point builds the best solution possible. 
One’s purpose is to evaluate the order based GEA and to make a comparison 
of several genetic operators, both in solving the TSP and the afore mentioned 
problems. The aim is to uncover regularities in the results beyond the obvious 
and substantial differences in the problem’s structures and data. 

The paper is organized as follows: it starts with a description of the basic 
concepts of GEAs and the practical work so far developed; next, the problems to 
be addressed are defined as well as their software structure; finally, some results 
are presented and discussed. 

2 Genetic and Evolutionary Algorithms 

2.1 Basic Concepts 

In this work the term Genetic and Evolutionary Algorithm (GEA) is used to 
name a family of computational procedures that share a set of common features: 

— there are a number of potential solutions (individuals) to a problem, evolving 
simultaneously (a population); 

— each individual represents a solution to a problem, which is coded by a string 
(chromosome) of symbols (genes), taken from a well defined alphabet; 

— the individuals are evaluated; i.e., to each of them is assigned a numeric 
value (fitness), that stands for their quality, in solving the problem; 

— the solutions to the problem can be recombined and/or changed in some 
way, by using genetic operators (eg. crossover, mutation), in order to create 
new solutions (reproduction); 

— the process is evolutionary; i.e., it is based on the Darwinian process of natu- 
ral selection, where the fittest individuals have greater chances of surviving; 

— its major structure is the one outlined in the pseudo-code of Figure 1. 



A Study of Order Based Genetic and Evolutionary Algorithms 603 



BEGIN 

Initialize time {t = 0). 

Generate and evaluate the individuals in the initial population (Po)- 
WHILE NOT (termination criteria) DO 
Select from Pt, a number of individuals for reproduction. 

Apply to those individuals the genetic operators to breed the offspring. 
Evaluate the offspring. 

Select the offspring to insert into the next population (Pt+i). 

Select the survivors from Pt to be reinserted into Pt+i. 

Increase current time (t = t -\- 1). 

END 



Fig. 1. Structure of a GEA 



2.2 Order Based Representations 

In an individuaLs Order Based Representation (OBR)^ the genetic information 
is based on the order of the genes, that take values from a fixed set of values, 
with the constraint take each one is unique and such that every value in the set 
is on the chromosome; i.e., the chromosome is a permutation of the symbols in 
a given alphabet. The constraint on non-duplicates justifies the development of 
a whole new class of genetic operators, namely the erossover and the mutation 
ones. 

A erossover operator is defined to take two individuals as input (the an- 
cestors) and return two different ones (the offspring). In this study a number 
of different erossover operators, designed to work with OBR individuals, were 
used: 

— Order Preserving Crossover (OPX) 

The OPX family emphasizes the relative order of the genes from both par- 
ents. The algorithm works by selecting a random cutting point and then 
taking all genes, from the beginning to the cutting point, from one parent. 
The other parent is used to fill in the remaining genes, by preserving their 
relative order. The process can be generalized to n cutting points, although 
only values of one (OPXl) and two (OPX2) were considered. 

— Uniform Order Preserving Crossover (UOPX) 

This operator has some similarities with the previous one. It works with a 
randomly generated binary mask. In all positions marked with 1, the off- 
spring takes the gene from the first parent, in the same position. The second 
parent is used to fill in the gaps, preserving the relative order of the nodes [1]. 

— Partially Matehed Crossover (PMX) 

Under the PMX[2] two crossing points are randomly chosen, defining a 
matehing seetion on the string, used to effect a cross between the two parents, 
through position-to-position exchange operations. 

— Cyele Crossover (CYCX) 

Cycle crossover [9] performs recombination under the constraint that each 
gene in a certain position must come from one parent or the other. 



604 Miguel Rocha et al. 



— EDGe Crossover (EDGX) 

The EDGX is based on the principle of maintaining all possible pairs of 
adjacent genes on the chromosome. It was specially designed for the TSP[11]. 
The algorithm works by collecting the neighborhood information, for each 
gene, in a table of adjacencies, from both ancestors. 

— Maximum Preservative Crossover (MPX) 

The MPX operator was designed by Miihlenbein[7] with the purpose to tackle 
the TSP by preserving, in the offspring, sub-tours contained in the two par- 
ents. 

— SCHleuter Crossover (SCHX) 

The SCHX[lG\ is a variation of the MPX^ with some features similar to the 
OPX ones, and also contemplating the inversion of partial tours. 

A mutation, typically induces a small change to the genotype of an ances- 
tor, returning one offspring. A parameter is defined, called Mutation Rate (MR), 
which sets the probability under which a mutation operator is applied, to a par- 
ticular position of the genotype. In this work, four different mutation operators 
were considered: 

— ADJacent swap (ADJ) - It consists of a swap of positions among the selected 
gene and the next one on the string; 

— Non- ADJacent swap (NADJ) - Swaps positions between the current gene 
and a different one at a random position on the string; 

— K-PERMutation (KPERM) - Given a parameter k it scrambles a sub-list of 
size k, starting at the current position on the string; 

— IN Version (INV) - Given a parameter k it inverts a partial sub-list of size k, 
starting at the current position on the string. 



2.3 The Genetic and Evolutionary Programming Environment 

The Genetic and Evolutionary Programming Environment (GEPE) was built 
with the purpose to increment the productivity when developing applications 
with GEAs[S]. It takes advantage on the features of the object-oriented paradigm, 
identifying a common background in the existing approaches, allowing for mod- 
ularity and incremental development. 

The framework developed is made of four main blocks (Figure 2), namely 
the individuals, the populations, the GEAs and the evaluation module. Each of 
these modules is materialized by an hierarchy of classes, that are built in a way 
that the common attributes and behaviors are defined in the root classes, and 
a process of specialization is followed when one walks to the leafs, redefining or 
adding new attributes and/or behaviors. 

At the individuaVs level, the root is an abstract class with a template field 
that contains its genotype; i.e., its genetic information. In this way, one sets 
the doings for any kind of representational scheme, simply by assignment of 
the template with the necessary data type (Figure 3). To implement, under this 
framework, the OBR individuals, one considers that any alphabet of cardinality n 



A Study of Order Based Genetic and Evolutionary Algorithms 



605 




Fig. 2. The GEPE’s Archetype 



can be represented by integers from 1 to n. In this way, the template field is 
instantiated with the integer type. A subclass OBRIndiv is created, where one 
defines the set of crossover and mutation operators described. 



Abstract class 




Order-based Binary representation 

Representation 



Real-valued 

Representation 



Template instanciation 
Sub-class 



Fig. 3. The Individuals hierarchy’s class 



At the population and GEA levels, similar strategies are followed, allowing for 
the easy definition of default behaviors, but also for the possibility of redefining 
the parameters, such as the selection methods, the re-insertion parameters, or 
the structure of the overall algorithm. 

The last of the modules in the system is the evaluation one, where the pro- 
grammer defines the decoding procedure; i.e., how to reach a solution to a given 
problem, starting with the chromosome, and how to assign a fitness to the solu- 
tion. This module makes the connection between the GEA and the problem to 
solve. 

In GEPE the concept of genetic operator was generalized, to allow for m 
individuals as input, and n individuals as output. For each application, the user 



606 Miguel Rocha et al. 



supplies a table of operators to be used, among the feasible ones. To each operator 
it is associated a selection’s value (probability), in order to generate offspring. 

3 Problem Formulation 

3.1 The Traveling Salesman Problem 

The Traveling Salesman Problem (TSP) is stated as a n-dimensional cost matrix 
of values where the purpose of the exercise is to obtain a permutation of 
these values, such that the sum of the costs for any i and j, being i the 
precedent of j in the sequence, is minimal. More formally, one has: 



Minimize : 


V" V" d-x- 
Z^i=l Z^i=l “y-'-y 


(1) 


Subject to : 


YTj=iXij = 


( 2 ) 




TJl=iXij = l,Vj 


( 3 ) 




Xij E {0, 


( 4 ) 




< \S\,yScV,S^lH 


( 5 ) 



When dij = dji^Mi^j one is faced with a Symmetric TSP (STSP); the inverse 
problem is said to be an Asymmetric TSP (ATSP). 

3.2 The Knapsacking Problem 

The 0/1 Knapsacking problem deals with a set of n objects, characterized by a 
given weight {W/) and profit (P^)[6]. One aims to select a subset of those objects, 
in a way to maximize the sum of its profits, but preventing the sum of its weights 
to exceed a given capacity C . More formally one has: 

Maximize : //Ji=i 

Restrictions : x/ZVi < C 

Xi G {0, 1}, Vi 



3.3 The Scheduling Problem 

Scheduling problems are concerned with decision-making processes that produce 
plans, allotting the work to be done and the time for it. Part of the scheduling 
problems can be described in terms of the Job Shop Scheduling Problem (JSSP), 
where one has a set J of n tasks, a set M of m machines, and a set O of 
operations. For each operation op ^ O there is a task jop G J to which a 
machine mop G M is conjuncted, where task jop will be processed, in a given 
time top G There is also a temporary binary ordering relation that decomposes 
the set O in a group of partially ordered sets according to the tasks; i.e., ii x ^ y 



A Study of Order Based Genetic and Evolutionary Algorithms 607 



then jx jy, and there is not a z different from x or such that x ^ z or z ^ y. 
Electing as objective the minimization of the time elapsed with the processing of 
all tasks, the problem consists on seeking an initial time Sop for each operation 
op, such that the function max {s op + top) and op G O is minimized, taking into 
attention the invariants: 

(i) top G O, Vop G O 

(ii) Sx — Sy > ty^ if y ^ x^ and x^y ^ O 

(iii) {si — Sj > tj) V {sj — Si > ti) if mi = and GO 

3.4 The Graph Coloring Problem 

Consider a graph, with a numerical weight associated with each node; given n 
different colors, the Graph Coloring problem consists in achieving the highest 
score by assigning to each node of the graph one color from the set. It is required 
that no pair of connected nodes can have the same color. The total score of a 
feasible solution is the sum of the weights for the colored nodes. 

4 The Implementation 

The GEAs designed to solve each of the given problems were implemented under 
the GEPE framework, and used OBR. The differences in the approaches were 
at the evaluation module level. In the TSP case the decoding of the solution is 
quite straightforward, once the order of the genes corresponds to the order of 
the nodes in the solution. The fitness is the sum of the costs, for each edge in 
the solution. In the other problems, the strategy used is an indirect one. The 
genotype is used as an order by which the different items are taken, and an 
heuristic procedure is used to create the solution based on that information. 

In the Knapsaeking problem, each gene represents an object, and one builds 
a solution by getting them into it, according to the order in the chromosome. 
The fitness is gotten by the sum of the objects profits in the solution. The Graph 
Coloring problem uses a similar strategy, once the nodes are colored according a 
predefined order, assuring that no connected nodes receive the same color, and 
evaluate the solution by summing the weights of the colored nodes. In terms of 
the JSSPj the chromosome represents a sequence of orders. The heuristic thus 
takes the orders in the sequence and schedules them in the best way possible; 
i.e., allocates one order at a time without violating any of the constraints, and 
minimizing the time it takes to be finished. The fitness is the total time necessary 
for the completion of a given portfolio. 



5 Results 

For each of the problems referred to above, it was selected a representative in- 
stance. In the TSP case, the STSP and ATSP variants were considered, taken 



608 Miguel Rocha et al. 



from TSPLIB[3]. The Graph Coloring instance was taken from [1]. The Knap- 
sacking and Scheduling instances were generated using stochastic simulators. In 
the former case, one used the concepts from [6] and created an instance with 
200 objects, with an average capacity and a weakly correlation between profits 
and weights. In the latter, the instance of the JSSP was a typical portfolio of 50 
orders, in an environment of 5 machines. 

Each run was defined to have two genetic operators: one of crossover and 
one of mutation. The crossover operator was responsible for generating 75% 
of the offspring, while the mutation one generated the remaining 25% (with a 
mutation rate of 5% per gene). The results for each pair crossover/mutation were 
obtained by averaging the best result obtained in 20 runs, with random initial 
populations. In the TSP and JSSP problems one run the GEA for 1000 iterations 
with populations of 200 individuals, while on the other ones the number of 
generations was 500, and the population size 100. 

In tables 1 to 5 one shows the results so far obtained. It is easy to reach an 
immediate conclusion: the crossover operators designed with the TSP in mind 
don’t behave well in the other problems. This is something one should expect 
to happen. On the other hand, the UOPX^ a general purpose operator, seems 
to behave remarkably well in all cases, and also when combined with all the 
mutation operators, which is a proof of its robustness. One of the reasons for 
such success can probably be found in the way this operator deals with the 
maintenance of the genetic diversity, once it is quite disruptive, and prevents 
excessively homogeneous populations. 



Table 1. Experimental results for the STSP-Eil5Ps problem 



Crossover 

Operator 


Mutation Operator 


ADJ 


NADJ 


K-PERM 


INV 


OPXl 


762.8 


567.0 


658.7 


676.6 


OPX2 


569.1 


517.8 


540.2 


531.2 


UOPX 


475.3 


466.3 


464.9 


462.3 


PMX 


570.8 


573.0 


582.3 


634.8 


CYCX 


864.9 


604.0 


725.0 


806.4 


EDGX 


459.6 


467.5 


457.1 


499.6 


SCHX 


469.4 


580.0 


500.5 


517.2 


MPX 


482.8 


470.0 


473.5 


500.6 



6 Conclusions and Future Work 

When one looks at the Nature, the kind of genetic representation used is highly 
indirect; i.e., it relies heavily on the embryogenetic mechanisms that translate 
from an abstract quaternary alphabet into the diversity of life one may observe. 
In the computational counterpart, one believes that the trend is to increase on 



A Study of Order Based Genetic and Evolutionary Algorithms 609 



Table 2. Experimental results for the ATSP-ft53^s problem 



Grossover 

Operator 


Mutation Operator 


ADJ 


NADJ 


K-PERM 


INV 


OPXl 


12821.6 


9974.8 


12875.0 


10619.2 


OPX2 


9600.5 


8985,0 


9352.6 


9264.5 


UOPX 


8500.2 


8625.1 


8210.8 


9412.8 


PMX 


9857.0 


9466.1 


9915.1 


12757.2 


CYCX 


14187.1 


10615.9 


13653.7 


11959.5 


EDGX 


8528.8 


8586.0 


8361.4 


8684.3 


SCHX 


8756.0 


10632.2 


10834.2 


10931.5 


MPX 


8445.7 


8133.5 


8349.0 


8541.4 



Table 3. Experimental results for the Knapsacking problem 



Grossover 

Operator 


Mutation Operator 


ADJ 


NADJ 


K-PERM 


INV 


OPXl 


11585.5 


12601.3 


11869.8 


11881.1 


OPX2 


12778.5 


12881.3 


12831.6 


12840.7 


UOPX 


12966.2 


12972.2 


12964.9 


12962.4 


PMX 


12818.2 


12853.7 


12622.9 


12585.1 


CYCX 


12788.5 


12816.0 


12799.1 


12790.1 


EDGX 


11582.4 


11698.2 


11448.3 


11578.8 


SCHX 


12480.6 


12574.0 


12451.4 


12439.7 


MPX 


11754.4 


11786.2 


11572.1 


11581.4 



the complexity of the decoders and keeping the representations simple. The com- 
plexity of the systems must emerge from the combination of a simple evolutionary 
process, with general purpose operators, with a set of decoding procedures, giv- 
ing by straightforward heuristic methods. This work showed that, although the 
structure of the problems may change, one could find a set of genetic operators 
with a good level of performance. 

In the future one intends to work on several other problems (eg. vehicle 
routing, clustering problems) in order to further generalize these results. The 
work on embryogenesis within GEAs is also a topic under study. 



References 

1. Lawrence Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, 1991. 
601, 603, 608 

2. D. Goldberg and R. Lingle. Alleles, Loci and the Traveling Salesman Problem. In 
J.Grenfenstette, editor, Proc. of the 1st Intern. Conf. on Genetic Algorithms and 
their Applications, Hillsdale, New Jersey, 1985. Lawrence Erlbaum Assoc. 602, 
604 

3. G.Reinelt. Tsplib’95. Technical report, 1995. 608 



610 Miguel Rocha et al. 



Table 4. Experimental results for the Scheduling problem 



Grossover 

Operator 


Mutation Operator 


ADJ 


NADJ 


K-PERM 


INV 


OPXl 


5019.3 


1095.4 


2929.9 


1607.7 


OPX2 


3024.1 


1214.8 


1554.3 


1797.5 


UOPX 


1127.4 


1133.9 


1139.5 


1141.4 


PMX 


1098.7 


1165.1 


1172.4 


1235.1 


CYCX 


5956.2 


1092.3 


2083.0 


3022.5 


EDGX 


2428.8 


2956.4 


2100.0 


2763.1 


SCHX 


1213.5 


1464.3 


1358.0 


1220.9 


MPX 


1109.8 


1116.9 


1235.1 


1199.2 



Table 5. Experimental results for the Graph Coloring problem 



Grossover 

Operator 


Mutation Operator 


ADJ 


NADJ 


K-PERM 


INV 


OPXl 


9894.7 


9787.4 


10003.2 


10151.2 


OPX2 


10407.5 


10464.4 


10409.7 


10387.8 


UOPX 


10503.0 


10540.4 


10516.7 


10533.5 


PMX 


10370.8 


10429.0 


10420.4 


10437.8 


CYCX 


10154.2 


10443.3 


10296.4 


10285.3 


EDGX 


9861.9 


9995.0 


9932.0 


9992.4 


SCHX 


10373.5 


10229.8 


10412.2 


10407.3 


MPX 


9947.5 


10001.9 


10033.6 


9966.3 



4. John Holland. Adaptation in Natural and Artificial Systems. PhD thesis, University 
of Michigan, Ann Arbor, 1975. 601 

5. K. Mathias and D. Whitley. Genetic Operators, the Fitness Landscape and the 
Traveling Salesman Problem. In R. Manner and B.Manderick, editors. Parallel 
Problem Solving from Nature 2- Brussels, Amsterdam, 1992. Elsevier. 602 

6. Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. 
Springer- Verlag, USA, third edition, 1996. 601, 606, 608 

7. H. Miihlenbein. Evolution in time and space - the parallel genetic algorithm. In 
G. Rawlins, editor. Foundations of Genetic Algorithms, pages 316-337. Morgan- 
Kaufman, 1991. 604 

8. J. Neves, M. Rocha, H. Rodrigues, M. Biscaia, and J. Alves. Adaptive Strategies 
and the Design of Evolutionary Applications. In Proc. of the Genetic and Evolu- 
tionary Computation Conference (GECC099), Orlando, Florida, USA, 1999. 604 

9. I. M. Oliver, D.J. Smith, and J. Holland. A Study of Permutation Grossover Op- 
erators on the Travelling Salesman Problem. In J.Grenfenstette, editor, Proc. of 
the 2nd Intern. Conf. on Genetic Algorithms and their Applications. Lawrence 
Erlbaum Assoc., July 1987. 604 

10. M.G. Schleuter. ASPARAGOS - An Asynchronous Parallel Genetic Optimization 
Strategy. In J.D. Schafer, editor, Proc. of the 3rd ICGA. George-Mason Univ., Mor- 
gan Kaufman, 1989. 604 



A Study of Order Based Genetic and Evolutionary Algorithms 611 



11. T. Starkweather, S. McDaniel, K. Mathias, D. Whitley, and C. Whitley. A Com- 
parison of Genetic Sequencing Algorithms. In R.Belew and L. Booker, editors, Proc. 
of the 4th ICGA, San Diego, July 1991. Morgan-Kaufmann Publishers. 602, 604 

12. D. Whitley. A Genetic Algorithm Tutorial. Stastistics and Computing, 4, 1994. 
601 



Nuclear Power Plant Preventive Maintenance Planning 
Using Genetic Algorithms 



Vili Podgorelec*, Peter Kokol*, and Andrej Kunej^ 

* Laboratory for System Design, University of Maribor - FERI 
Smetanova 17, SI-2000 Maribor, Slovenia 
{vili . Podgorelec , Kokol }@uni-mb . si 

^ Nuclear Power Plant KrskoVrbina 12, Krsko, Slovenia 



Abstract. To achieve the highest reliability level of a nuclear power plant 
operation a preventive maintenance program is used that must be effective and 
efficient. To improve the existing maintenance a pilot project concerning 
electrical components was started in the NPP Krsko. We developed and 
implemented a new co-evolutionary method to optimize preventive 
maintenance activities by finding an optimal time-plan and selecting 
appropriate maintenance personnel allocation. 



1 Introduction 

Often it has been recognized that exact heuristic methods are quite inefficient for very 
complex real world scheduling problems. Therefore a lot of different non-exact or soft 
methods have been used lately, which do not give optimal solutions, but reasonably 
good solutions are obtained in a relative short time. Among them genetic algorithms 
are very reasonable possibility, that we have already successfully used to solve patient 
scheduling problem in physiotherapy [4] and diagnosis optimization [5]. 

Genetic algorithms are adaptive heuristic search methods [6,7] which may be used 
to solve all kinds of complex search and optimization problems [8]. They are based on 
the evolutionary ideas of natural selection and genetic processes of biological 
organisms. As the natural populations evolve according to the principles of natural 
selection and »survival of the fittest«, first laid down by Charles Darwin, so by 
simulating this process, genetic algorithms are able to evolve solutions to real-world 
problems, if they have been suitably encoded. They are often capable of finding 
optimal solutions even in the most complex of search spaces or at least they offer 
significant benefits over other search and optimization techniques. 

One very important aspect of natural evolution is not included in traditional genetic 
algorithms - impact of environment on evolution of single species and using the 
natural phenomenon of co-evolution. In nature it is quite often that two or more 
species have to evolve in respect to each other to become more appropriate for 
resource fighting in the environment. There are several cases in engineering problems 
where there are a few more or less independent parts to be optimized, and where the 
final results are combined from all of them. We used the principle of co-evolution to 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 611-616, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




612 Vili Podgorelec et al. 



solve double scheduling problem of preventive maintenance optimization, where 
maintenance activities constructing time-plan are the first, and maintenance personnel 
allocation to activities is the second type of individuals in our co-evolutionary model. 



2 Preventive Maintenance Program in a Nuclear Power Plant 

Reliability should be one of the main concerns in a nuclear power plant (NPP) 
operation. To achieve a significant reliability level, the reliability must be 
continuously managed, which is not possible without a proper preventive maintenance 
program (PMP) that must be effective and efficient [1,2]. Considering the potential 
efficiency of genetic algorithms in many engineering applications [3], a pilot project 
concerning electrical components was started in the NPP Krsko. 

The objectives of a PM program are to prevent equipment breakdown, to increase 
the equipment reliability and availability, to extend its life, and to maintain the 
equipment in a satisfactionary condition in all modes of operation. PMP includes three 
types of maintenance activities: 1) predictive maintenance is a continuous or periodic 
monitoring of and diagnosing components in order to forecast equipment failure; 2) 
periodic maintenance is an activity based on operating hours or calendar time; and 3) 
planned maintenance is the activity performed prior to equipment failure and can be 
initiated by predictive or periodic maintenance results, by manufacturer 
recommendation, operating experience etc. 

Electrical components are a great group of equipment which require a permanent 
team and a maintenance and engineering personnel. In NPP Krsko PMP incorporates 
three classes of activities concerning electrical components: 1) inspection covers 
external component examination; current, vibration and temperature measurement; oil 
and bearing checking; 2) revision covers external and internal motor examinations; 
checking/replacing oil or grease, electrical measurements etc.; and 3) overhaul covers 
complete dismantling, cleaning, replacing wear parts, measuring, testing etc. Above 
activities are performed continuously during normal plant operation. Activities on 
components which are not possible to be carried out, are planned to be performed 
during regular outage or planned shutdowns. 

Considering the optimization of PMP, one have to define the time-plan of the 
activities in which they are to be performed, and allocate available resources to 
specific activities. As the final result we would like to obtain the shortest (in the 
manner of time) possible solution with minimal resources (number of personnel) 
fulfilling the given constraints. When scheduling activities, there are several 
constraints to be fulfilled: 

- certain activities have to be completed before some others can start, 

- some activities can be performed simultaneously, and some others can not, 

- number of simultaneously performed activities is limited with available 
personnel, 

- each activity can be performed only by trained personnel, 

for each activity there is a given minimum and maximum number of personnel 
that can perform that activity and associated times in which it will be done. 




Nuclear Power Plant Preventive Maintenanee Planning Using Genetie Algorithms 613 



- some workers can simultaneously work on more than one activity and some 
others can not, and 

- the coordinator of the activity is always selected from NPP Krsko personnel. 





Fig 1. Two parts of global solution for the PMP optimization problem that is being evolved 
through our eo-evolutionary model. The arrows indieate preeedenee eonstraints and the 
numbers indieate the task identifier. If something is ehanged in one part it infiuenees the other 
part and also the global solution. 



Considering all of above constraints, the whole optimization problem can be divided 
into two parts: 1) defining optimal time-plan for all activities, and 2) allocating 
personnel to activities. It is quite obvious that the two interfere in several points. For 
example, the duration of an activity depends on how many workers are working on it, 
as the two simultaneously scheduled activities forbid the same personnel to work on 
them both (with the exception of those, who can work on more than one activity at the 
same time). One possible solution to given problem can be seen on Figure 1. 



3 Co-evolutionary Model for PMP Optimization 

Co-evolution requires genetic algorithm to work simultaneously on more than one 
type of individuals (two in our case), coupled into the same search space and 
evaluated with one general cross-evaluation function (Figure 2). Evolution of a single 
species (one type of individuals) can be seen as a traditional genetic algorithm process 



614 Vili Podgorelec et al. 



but one general cross evaluation function is used to evaluate both populations. In this 
manner one very important aspect of natural evolution is considered, namely the 
influence of some species to evolution of another (the environment factor is added). 
The cross evaluation function has to be defined in such a manner that it evaluates the 
global solution based on a combination of one individual from each population. 




Fig 2. Co-evolutionary process. Individuals from both populations are evolved independent 
from each others but evaluation function works on combinations of one individual from each 
population regarding the cross-evaluation scheme. 



There are several possible schemes on how to select individuals from populations that 
will be cross evaluated. The most simple is one-to-one cross-evaluation scheme 
(Figure 3. a), where both populations have to be of the same size. There is only a few 
combinations (exactly the population size), so it is quite fast, but a lot of possibly 
good combinations are left out, therefore it is not very effective. The second one is full 
cross-evaluation scheme (Figure 3.b), where all possible combinations of individuals 
from both populations are used. In this manner it is guaranteed that the best 
combination will be considered, but it takes a lot of time (imagine models with several 
co-evolving populations). The last one is moderated cross-evaluation scheme, where 
only some combinations are used. This scheme actually represents a compromise 
between first two possibilities and should be used if the full cross-evaluation scheme 
would require to much computing time. 











Nuclear Power Plant Preventive Maintenanee Planning Using Genetie Algorithms 615 




a) one-to-one cross 
evaluation 



b) full cross-evaluation 



c) moderated 
cross-evaluation 



Fig 3. Different eross-evaluation sehemes as ean be used for eross evaluation funetion in eo- 
evolutionary proeess. Only one-to-one eross evaluation seheme requires the same population 
size for both populations being eross-evaluated. 



According to described co-evolutionary method we have implemented it to the PMP 
optimization. For the first population time-plans for activities have been used as 
individuals that have to be optimized (Figure La). A time-plan was represented as a 
time order in which activities have to be performed considering the possibilities of 
simultaneity according to the dependencies between activities. For the second 
population personnel allocations have been used (Figure l.b) - distribution of 
available personnel to activities considering the given constraints. 

As it should be obvious all changes in one population affect the global solution and 
therefore the evolution in another. Since all genetic operators change the individuals 
they have to be defined in a manner to guarantee that the feasibility of the obtained 
solutions is preserved. We managed to do this by representing time-plans of activities 
only with time order and not the absolute time intervals. Exact time intervals are then 
calculated in accordance with cross-selected personnel allocation plan in the phase of 
cross evaluation to obtain a valid solution. 



4 Results 

We tested our method on a segment of PMP for electrical components in NPP 
Krsko. It consisted of 41 maintenance activities and number of available maintenance 
personnel was 12. We optimized PMP with regards to different criteria: 1) emergency 
24-hours non-stop maintenance, 2) planned 8-hours per day maintenance, 3) economy 
low-cost 24-hours maintenance. For all situations obtained results were very good. In 
the table 1 we show average results over 5 runs, evolved after 1000 generations for the 
emergency 24-hours non-stop maintenance optimization. Comparison has been made 
between the manual scheduled process, the results obtained with our method for the 
same constraints, and for the case when maintenance personnel consists of 20 people 
instead of 12. Our co-evolutionary method found a solution that needs four times less 
time as the manually generated one. Population sizes were 90 for the time-plans and 
60 for the personnel allocation. We used 17% of all pairs for the cross-evaluation, 
based on individuals' ranking. 







616 Vili Podgorelec et al. 



Table 1. Comparison of three solutions. First is manually seheduled and the last two are 
obtained with our eo-evolutionary method. 





manually scheduled 


1 GA scheduled 


2“^ GA scheduled 


time order 


based on CPM 


equal to manual 


equal to manual 


personnel 


12 


12 


20 


overall duration 


228:00 


52:14 


34:40 



5 Conclusion 

Regarding the obtained results from the performed tests we believe that our method 
can be efficiently used for the PM optimization in NPP. Because of its generality it 
can be also used with minor changes for several other cases of process optimization. 
Before giving the final remark though, we will have to compare it with some other 
automated methods, as the comparison with a manual procedure may not be fair, since 
the complete procedure is deterministic and guaranteed to terminate. 

A good confirmation of the quality of developed method was the fact that it found 
the same time-order for the activities as it was scheduled manually, based on the CPM 
method. But with different resources allocation evolved solution had more then four 
times shorter overall duration than the manually generated one. 



References 

[1] Kokol, P., Kunej, A.: EOP - The Robust User Oriented Paradigm for 
Designing Engineering Software Systems, In Proceedings of DPIC, John 
Wiley, Chichester (1991) 

[2] Laakso, K., Simola, K., Holmberg, B.: Examples of Reliability Assessment of 
Maintenance in Finish Nuclear Power Plants, In Proceedings of IAEA Meeting, 
IAEA, Stockholm (1990) 

[3] Dasgupta, D., Michalewicz, Z.: Evolutionary Algorithms in Engineering 
Applications, Springer - Verlag, Berlin, Heidelberg (1997) 

[4] Podgorelec, V., Kokol, P.:Knowledge-based Directed Genetic Algorithm for 
Highly Constrained Scheduling Problems, In Proceedings of the International 
ICSC Symposium on Engineering of Intelligent Systems EIS'98 (1998) 

[5] Kokol, P., Podgorelec, V., Maleic, I.: Diagnostic Process Optimisation with 
Evolutionary Programming, In Proceedings of 11^^ IEEE Symposium on 
Computer Based Medical Systems CBMS'98, pp. 62-67, IEEE Computer 
Society Press (1998) 

[6] Baeck, T.: Evolutionary Algorithms in Theory and Practice, Oxford University Press, 
Inc. (1996) 

[7] Forrest, S: Genetic Algorithms, ACM Computing Surveys, 1(28) 77-80 (1996) 

[8] Goldberg, D.E.: Genetic Algorithms in Search, Optimization, and Machine Learning, 
Addison Wesley, Reading MA (1989) 







Progress Report: 

Improving the Stock Price Forecasting Performance 
of the Bull Flag Heuristic 
with Genetic Algorithms and Neural Networks 

William Leigh', Edwin Odisho', Noemi Paz', and Mario Paz^ 

* University of Central Florida, Department of MIS, Orlando, FL 
^ University of Louisville, Department of Civil Engineering, Louisville, Kentucky 

Abstract. We back-test a pattern-based heuristic from stock market technical 
analysis on price and volume time series data for Alcoa Aluminum Company’s 
common stock. Promising results are obtained using a pattern matching 
approach implemented with spreadsheet technology. Improvement in these 
results are attained through the application of neural networks and genetic 
algorithms. Results are confirmed statistically. 



1 Introduction 

Approaches to the prediction of stock market prices are characterized as fundamental 
or technical. The assumption underlying prediction from the fundamental point of 
view is that prices in financial markets are based on economic and managerial 
mechanisms which may be understood and predicted based on realities of interest 
rates, cost trends, competitive forces, and so forth. Graham and Dodd [1] wrote the 
classic guide to fundamental analysis of investments years ago. The technical 
approach does not deny the validity of the fundamentalist approach but recognizes 
that available knowledge of the fundamentals particular investment are available to all 
and have been factored into the current market price. Technical analysis focuses on 
the dynamics of the market behavior itself Charles Dow developed the original Dow 
theory for technical analysis in 1884. A modern explication is found in Edwards and 
Magee [2]. 

Technical analysis is based on the recognition of patterns in the price and volume 
statistics of stocks. According to technical analysis, certain patterns signal imminent 
price rises and declines. Patterns may be applied to individual stocks or to 
aggregations of stocks or to whole markets. Many how-to books and magazine 
articles describe the technical analysis of stocks. The technical approach has little 
academic credibility. The generally accepted efficient markets hypothesis, explained 
and surveyed by Fama [3], states that market prices follow a random walk and cannot 
be predicted based on their past behavior. However, several researchers, [4], [5], [6], 
report using neural nets to predict future market prices based on past prices. 

Considerable work exists in the academic literature concerning the validation and 
verification of expert systems (surveyed in Weiss and Kulikowski [7]). Technical 
analysis includes a set of heuristics based on market patterns which can be 
operationalized as an expert system to advise stock market purchases and sales. That 
expert system can then be validated and verified using the accepted method of cross- 
validation through back-testing on actual data, thus giving some support to the worth 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 617-622, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




618 William Leigh et al. 



of technical analysis (in the case that profits are realized through simulated market 
trades based on the expert system’s recommendations), or failure to support the worth 
of technical analysis (in the case of losses.) This is the approach of this paper. 

2 Pattern Recognition of a Stock Chart Pattern 

We work with one pattern from technical analysis, the “bull flag.” The definition for 
“flag” from Downes and Goodman [8]: 

FLAG — technical chart pattern resembling a flag shaped 
like a parallelogram with masts on either 
side, showing a consolidation within a 
trend. It results from price fluctuations 
within a narrow range, both preceded and 
followed by sharp rises or declines. 

A “bull flag” is a flag followed by a “breakout” in the positive direction. Downes and 
Goodman’s definition of “breakout”: 

BREAKOUT — rise in a security’s price above a resistance 
level (commonly its previous high price) or 
drop below a level of support (commonly the 
former lowest price). A breakout is taken to 
signify a continuing move in the same 
direction. 

Our bull flag pattern includes a descending band of consolidation followed by a 
breakout in the positive direction. The template for this bull flag pattern is shown if 
Figure 1. The vertical dimension is price or sales volume, and the horizontal 
dimension is time. There are ten columns and ten rows in this template. The columns 
are mapped to the dates in the time series of data, and the rows are mapped to price or 
volume of the time series of data. For example, the first column in Figure 1 represents 
that the data points for the dates associated with the column in the template are 
equally distributed in the top four rows, that is, 25% of the points appear in each of 
the top four rows. 



25% 




















25% 


25% 


















25% 


25% 


25% 
















25% 


25% 


25% 


25% 
















25% 


2% 


25% 


25% 










25% 






25% 


25% 


25% 


25% 








25% 








25% 


25% 


25% 


25% 


25% 


25% 


25% 










25% 


25% 


25% 


25% 


25% 


25% 












25% 


25% 


25% 


25% 
















25% 


25% 


25% 





Fig 1. “Bull Flag” Pattern Template. 





Improving the Stock Price Forecasting Performance of the Bull Flag Heuristic 619 



To compute the degree of match between the bull flag template and the grid of 
values derived from the time series data, the percentage of values which falls in each 
cell of a column is multiplied by the weight in the corresponding cell of the bull flag 
template. This cross-correlation computation is done for the 10 cells in the column 
and summed, resulting in a fit value for the column. Thus, 10 column fit values for 
price and 10 column fit values for volume are computed for each trading day. 
Summing all 20 values for a trading day results in a total fit for the trading day. 

This process is an example of template matching, a pattern recognition technique 
used to match an image to a template, which is exactly what we are doing here. The 
two-dimensional image Iprice (x,y) is synthesized in the 10X10 grid from the 120-day 
window on the time series of price values as follows: 

Pt = price value on trading day t 

t = 0 is the current trading day in the 120-day window (which becomes the last 
day, or rightmost, in the image) 

The increment in price value between each of the 10 grids row is computed: 

Pmax = maximum { pt where -119<=t<=0} 

Pmin = minimum { pt where -119<=t<=0} 

yincrement ( Pmax ~ Pmin ) / 10 

The row yt in the 10X10 grid is determined using the increment: 

Yt ~ {y if f min Pt "^ ( y~l ) ( Yincrement ) Pmin Y ( yincrement )? 

10 if Pt Pmax } 

The column Xt is computed using the day in the widow which ranges from -59 to 0 
and the number of days which are represented in a column, 12: 

Xt = { z if-1 19 + 12 ( z - 1 ) <= t < -1 19 + 12 z, 

10ift = 0 } 

The proportion of values in a cell in the column is computed by counting the number 
of values in the 120-day window which correspond to that cell and dividing the total 
by 12: 

Ipriee ( Y ) ^ Cardinality {pt where yt = y and Xt = x and -1 19 <= t <= 0 } / 12 

To compute the degree of match between the 10X10 template and the 10X10 image 
synthesized from the time series values, the cross-correlation, between a column in 
the bull flag pattern template P (x,y) and a corresponding column in the 
image IpHce (x,y): 

Rprice?coiumn = Summation for y = 1 to 10 of P ( column, y ) Iprice ( column, y ) 

The computation for the values for volume and the degree of match R volume, column 
between a column in the pattern template and a corresponding column in the image 
Ivoiume ( X, y ) may be computed in a similar manner. In this way the time series of 
price and volume data is correlated with the bull flag pattern template to result in 
series of correlation values for each column, Rprice, column, t and Rvoiume,coiumn,t- 




620 William Leigh et al. 



3 Initial Results 

Closing price and daily volume data from the first stock (alphabetically) in the Dow 
Jones Industrial Average is used: 

Alcoa, Inc. (stock symbol: AA) World’s largest aluminum manufacturer 
with operations in more than 30 countries; integrated operations include 
bauxite mining, refining, and smelting, and the manufacture of aluminum 
products, including automotive components and beverage cans. 

Table 1 shows the aggregate results for buying and holding the stock for 30 trading 
days for the best (highest correlation) 1000 trading day total fits (the 20 column fits 
added together), out of 3,479 trading days in the period from 1985 to 1998. These 
results are compared with overall averages for buying and holding for trading days on 
every day in the period. The t-test used is one-tailed with the assumption of different 
variances. The improvement in 30-day mean profit between the buys indicated by the 
pattern fitting quality and the 30-day mean profit for purchasing on all trading days is 
statistically significant at less than the 1% level. 

Table 1: Initial Results for Best 1000 Total Fit Trading Days Compared with all Trading Days. 



For All Trading Days 




30-day Mean profit 


1.56% 


Annualized return 


13.13% 


Best 1000 Fits 




30-day Mean profit 


2.46% 


Annualized return 


20.74% 




Comparison on Mean Profit 




t-test significance 


0.00214 



4 Improving the Initial Results 

In an attempt to improve the price prediction results obtained using the total fit values 
alone, we modify the process to involve genetic algorithms and neural networks. First 
we compute a percentile score for each total fitting correlation value, computed from 
the first day in the series forward. Thus, the determination of the “besf ’ fits is based 
on comparison only with historical values. For purposes of this initial experiment the 
lower 80 percentile fitting days are removed from the test data, so the initial pattern 
fitting heuristic is used as a filter on the possible trading days. 

The template fitting process described above results in 20 column fitting 
correlation values for each trading day. We use the 20 values as the input variables to 
a conventional backpropagation-learning neural network. A genetic algorithm is 
configured to find the neural network parameter settings (stop epochs, learning rate, 
and number of nodes in hidden layers) by maximizing the R-squared values for 





Improving the Stock Price Forecasting Performance of the Bull Flag Heuristic 621 



actual 30-day-later stock values and the predicted values [9]. Three non-overlapping 
time period samples consisting of 150 20-percentile or better trading days are 
identified. The first 100 days are used for training and testing the genetic algorithm 
and neural network system. The last 50 days of each sample are used for validating 
the neural network which is found. These 150-day samples are selected so that there 
is at least a 30 trading day gap between the first 100 trading days in the sample and 
the last 50, so that the training profit percentage values do not corrupt the validity of 
the test sample. 



Table 2: Results for Three Samples with Neural Network Improvement. 



Sample 


1 


2 


3 










Average profit % for 
complete 50 trading 
day validation sample 


-6.79% 


5.89% 


4.90% 


Standard deviation 


8.63 


3.57 


3.08 


Average profit % for 
neural net indicated 
buys in validation 
sample 


-2.07% 


5.77% 


5.10% 


Standard deviation 


5.22 


3.59 


3.28 


t-test significance 


0.00177 


0.43474 


0.39157 





Table 2 presents the results of the attempted improvement. Only sample 1 
resulted in a marked change in the average profit percentage. But sample 1 is the only 
sample with a negative overall profit percentage, and the improvement, though the 
average is still negative, is large and statistically significant. 

5 Conclusion 

Technical analysis, though its academic reputation is less than sanguine, may be 
effective in predicting stock prices. Neural networks and genetic algorithms may 
offer the promise of improving the results obtained by technical analysis pattern 
recognition methods alone. Much work remains to be done in this investigation. 
Initial results reported here are promising. 



References 

1. Benjamin Graham and David Dodd, Security Analysis (Classic 1934 Edition), 
McGraw-Hill, New York, 1997. 

2. Robert Edwards and John Magee, Technical Analysis of Stock Trends, Seventh 
Edition, Amacom, New York, 1997. 

3. Eugene F. Fama, “Efficient Capital Markets: 11”, The Journal of Finance, 
Volume XL VI no. 5 December 1991, pages 1575-1617. 





622 William Leigh et al. 



4. A. N. Refenes and M. Azema-Barac, “Neural Network Applications in Financial 
Asset Management”, Journal of Neurocomputing and Applications, vol. 2, no. 1, 
pp. 13-39, 1994. 

5. Emad W. Saad, Danil V. Prokhorov, and Donald C. Wunsch, “Comparative 
Study of Stock Trend Prediction Using Time Delay, Recurrent and Probabilistic 
Neural Networks,” IEEE Transactions on Neural Networks, Vol. 9, no. 6, 
November 1998, pages 1456-1470. 

6. Tetsuji Tanigawa and Ken’ichi Kamijo, “Stock Price Pattern Matching System - 
dynamic Programming Neural Network Approach,” Proceedings of the 
International Joint Conference on Neural Networks, 1992, pages II-465-II-471. 

7. Sholom M. Weiss and Casimir A. Kulikowski, Computer Systems That Learn, 
Morgan Kaufmann, San Mateo, 1991. 

8. John Downes and Jordan Elliot Goodman, Dictionary of Finance and Investment 
Terms, fifth edition, Barron’s Educational Series, Inc., 1998. 

9. Vittorio Maniezzo, “Genetic Evolution of the Topology and Weight Distribution 
of Neural Networks,” IEEE Transactions on Neural Networks, Vol. 5, no. 1, 
January 1994, pages 39-53. 




Advanced Reservoir Simulation Using Soft Computing 

G. Janoski*’ *, F.-S. Li M. Pietrzyk *, A.H. Sung *, S.-H. Chang ^ and R.B. Grigg^ 

^Department of Computer Seienee 
*eorresponding author, tel: (505) 835-5126, fax: (505) 835-5587 
sung@nmt . edu 

^Petroleum Reeovery Researeh Center 
New Mexieo Institute of Mining and Teehnology 
Soeorro, New Mexieo 87801, USA 



Abstract. Reservoir simulation is a ehallenging problem for the oil and gas 
industry. A eorreetly ealibrated reservoir simulator provides an effeetive tool 
for reservoir evaluation that ean be used to obtain essential reservoir 
information. A long-standing problem in reservoir simulation is history 
matching, whieh is to find a suitable set of values for the simulator's input 
parameters sueh that the simulator eorreetly prediets the fluid (oil, gas, water, 
ete.) outputs of the wells on the reservoir, over the time period of interest. Due 
to the sheer size of the problem, eompletely satisfaetory results of history 
matehing have been diffieult and expensive to aehieve. This paper presents a 
novel teehnique of using fiizzy eontrol to solve history matehing. Intended for 
implementation on a eluster of PCs, our teehnique aims not only to solve 
history matehing faster, but also solves it at a lower eost. Preliminary results 
and ongoing work are deseribed. 

Keywords: Reservoir Simulation, History Matehing, Fuzzy Control, Parallel 
Proeessing 



1 Introduction 

Oil and gas production in the U.S.-and eventually, in most oil producing nations-will 
be increasingly dependent on improved oil recovery methods, due to the gradually 
diminishing reserves and the high cost of exploration and drilling new wells. This has 
contributed to the increasing acceptance of where a simulator is developed to model 
the reservoir, and once the reservoir simulator is correctly calibrated it is used to 
obtain essential information about the reservoir, and as a valuable means for 
performing reservoir characterization. 

The correct calibration of the reservoir simulator hinges on finding a set of input 
parameters (permeability, relative permeability, etc.), such that the simulated fluid 
(oil, gas, water, etc.) production of each well matches its actual production, across the 
time of interest. This is the well-known history-matching problem in reservoir 
modeling and its challenging nature can be attributed to three compounding factors: 

1. The size of the problem. For example, our 25 well project deals with a reservoir 
modeled by a multilayer grid that has a solution space of 150^^^^ ^ 2^^^^"^, which is 
clearly intractable for any solution algorithm based on enumerative search. 

2. Inadequate computing power. In view of the size of common history match 
models, even supercomputers, which are economically unfeasible for small producers, 
may be hard-pressed for the task. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 623-628, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




624 G. Janoski et al. 



3. The need for human intervention. The normal method of solution involves an 
iterative cycle of having a simulation expert simulate, check, and adjust the 
parameters till a satisfactory match is obtained. This results in a long turnaround and 
throwaway runs. 

While problem size is an irreducible factor- as it will increase since accurate 
simulations depend on using finer grids. The other two obstacles, however, can be 
overcome by implementing an intelligent controller for automatic parameter 
adjustment to minimize human intervention, and executing the integrated simulator- 
controller on a cluster of PCs communicating through the Web. This paper describes 
our effort of developing soft computing based techniques to solve the history 
matching problem. A fuzzy controller to perform a web-based, parallel simulation 
and history matching for an oil reservoir in New Mexico that was the site of a CO 2 
injection project has been implemented. Promising preliminary results that show the 
potential of our approach as well as ongoing work are presented. 



2 Background 

This section gives a condensed description of MASTER, our reservoir simulator; the 
carbon dioxide (CO 2 ) foam pilot program at the East Vacuum Grayburg/San Andres 
Unit {EVGSAU) reservoir operated by Phillips Petroleum Company; and the history 
matching problem that became the foundation for our investigation of Web based 
reservoir simulations. More details are available in the paper of Chang and 
Grigg (1998) and its references. 



2.1 The MASTER Simulator 



MASTER (Miscible Applied Simulation Techniques for Energy Recovery), developed 
by the U.S. Department of Energy, is a multicomponent pseudomiscible simulator 
that simultaneously tracks stock tank oil, natural gas, water, up to 4 solvent species 
and a surfactant. Natural gas and solvent one are allowed to partition between the 
gas, oil, and aqueous phases; while solvents two, three, and four partition between the 
gas and oil phases only. The surfactant exists in the aqueous phase only. The partial 
differential equations describing the multicomponent, multiphase flow (Aziz & 
Settari, 1979) are as follows: 



stock tank oil: 3 ^ , p ^ ^ „ 

Water plus surfactant: 



o 3 

^(t>Sw /Pw )= V . 

3* a w P . 



-V (|) , 



Soluble species: V|<i)(sg/p| + R|o Sq/Pq +R|* s„ /p„ )j - v 



k k“gV 

afPi 



+ V . '^roViRio Vi|,^ + V . kr'^ViRi^ ^ ^ ^ 



IIqPo 

where the Riw are zero for i > 1 . 

Surfactant: a . v . , 

M-wPw 




Advanced Reservoir Simulation Using Soft Computing 625 



The equations are discretized using volume integration and standard finite difference 
techniques. For more details, nomenclature, and solution algorithm, refer to Ammer 
et a/. (1991). 



2.2 The CO 2 Project 

To advance the CO 2 -foam technology for improved oil recovery, a pilot area in 
EVGSAU (in Lea County, New Mexico) was selected in 1990 by DOE as a site for a 
foam field trial to comprehensively evaluate the use of foam for improving the 
effectiveness of CO 2 injection projects. Operation of the foam field trial began 
in 1991 and ended in 1993. It successfully demonstrated (Martin, et al., 1995) that a 
strong foam could be formed in situ, and that the diversion of CO 2 to previously 
bypassed zones/areas, due to foam, resulted in increased oil production and 
dramatically decreased CO 2 production. 

As part of the CO 2 project, the MASTER simulator was modified by incorporating 
a foam model and used to conduct a history match study on the pilot area. The 
ultimate purpose was to establish a foam predictive model for CO 2 -foam processes. 
Details of the history match effort and results are reported in Chang and Grigg (1998). 



2.3 History Match Model 



The project pilot area is an "inverted nine-spot" pattern with 8 producers 
(wells 7, 8, 9, 12, 14, 17, 18, 19) and 1 foam injector in the center (well 13), as shown 
below. The pilot area of 9 wells and the 16 surrounding wells, 8 injectors and 8 
producers (outside the pilot area) were included in the history match model. The 
layout of the wells is depicted in Figure 1 below. 



M 









n 


[20] 


L 






1 




n 


EB 


n 


EB 


1 


1 [Well to be matched 


_J 


M 


□ 


EB 


n 


EB 


□ 


1^ 


m 


□ 


EB 


□ 


EB 


□ 


D 


□ 


5 1 1 [Production well 


n 


m 


□ 


EB 


n 


Q 


□ 


m 


1 [injection well 




n 


EQ 


□ 


D 


□ 


H 


r 






m 


□ 




r 





Fig.l. History Match Pilot Area 



The history match model consisted of a 16 x 16 grid in 7 separate layers (based on 
type-log zonation) for a total of 1792 grid blocks. Injection rates and bottom-hole 
pressures were specified as well constraints in the history model. The complete 
history match simulation would involve three phases of simulations: (1) primary 
depletion from 1959 to 1979, (2) water-flood from 1980 to 1985, and (3) CO 2 -flood 
WAG (water alternating gas) injection from 1985 to 1992. 




626 G. Janoski et al. 



3 MASTER Web 

MASTER Web is a PC clusters based, simulated parallel and distributed system 
specifically designed to carry out reservoir simulation and history matching using 
MASTER. Its benefits are simplicity, portability, adaptability, and, most importantly, 
low cost. The development of a low-cost, high-performance system to support the 
soft computing based techniques for reservoir simulation described in this paper is an 
important part of our project. However, for saving space and focusing on the AI- 
related aspects of the project, we omit the details of MASTER Web and refer those 
interested to the project's website listed below, where a description of MASTER Web 
and other material are found, http://baervan.nmt.edu/~co2/project/. In the following 
section, we describe some ongoing work that are being carried out using MASTER 
Web. 

4 Fuzzy Controller 

Fuzzy logic has been used to solve various problems and has demonstrated itself as an 
effective tool in many engineering applications (Klir & Yuan, 1995; Kosko, 1997; 
Jang et al., 1997). Applications of fuzzy logic for petroleum engineering are also 
found in the literature (Xiong et al., 1995; Li et al., 1999). This section describes a 
fuzzy controller for automatic parameter adjustment in using MASTER for a reduced- 
scale version of our history match problem. The purpose of the controller is to 
eliminate human intervention in the history matching process so that throughput will 
be greatly increased and cost will thus be reduced. 

4.1 Problem Overview 

This version of history matching is restricted to the primary oil production period. 
During this period, wells 8 and 12 were the only production wells. In order to reduce 
the number of the input parameters, the following simplifications are made: the 
permeability ratios of each layer to the top layer are assumed to be constants based on 
the type log analysis, and interwell permeabilities can be estimated by an interpolation 
algorithm based on the inverse distance weighted average and the top-layer 
permeabilities of the 25 wells. 

The first simplification reduces the input parameters from 16x16x7 to 16x16; 
and the second reduces the input parameter further from 16x16 to 25. Therefore, the 
only parameters being adjusted for the history match are the 25 input top-layer 
permeability values for blocks containing a well. In this preliminary study only the 
oil production of wells 8 and 12 were being matched. 

4.2 Fuzzy Controller for MASTER Web 

One of the goals of MASTER Web is to significantly reduce human interaction from 
the history matching process. It means that a controller should "guide" MASTER Web 
to the correct solution. For each input X, and the results obtained by that input, the 
controller should generate i.e., the next input for the simulator. Below, we give 
a high level description (omitting all the details) of this automated process. 

Let us assume that the input for the MASTER Simulator is a vector X = 
[xi , ... , x„], and the output is a 3m-tuple Y{X) - {Li , ... , , W\ , ... , W^ , 




Advanced Reservoir Simulation Using Soft Computing 627 



Gi , , G^), where m is the number of wells, Lf = [fi , ... , Z^] is the oil result for 

well /, Wi= {w\ , ... , wV] is the water result for well /, G^ = [g\ , ... , gV] is the gas 
result for well /, and k is the number of years for which the history data and simulator 
output are available. Further, let T = {L\ , ... , , W\ , ... , W'm ■> G\ , ... , G'm) 

represent the history data. 

For each well (1 < i < m), we define a weight vector Ef = [e\ , ... , e„], 
where -1 <e)<l represents the "impact" of the y-th component in the input vector X 
on well /. These weight vectors are provided by experts based on past experiments 
and the knowledge about geology of the field. 

For the result Y{X) we define the following scalar "goodness" measures h 

S, = 2, {L s„ = Z, ( W’-Wd, Sg = I.iiG ’-G,), 

S^, = Hi (L = Hi (W’-Wif, S% = Hi (G ’-G.f 

r, = Hi cc(L = Hi cc( W), W,), = Hi cc(G 

We define the adjustment function F{Si,S^,Sg,S^t,S^^,^g,rt,r„rg) = [/j , ... ,f^, where ft 
is a real number and the vector represents how "far" the result is from the actual 
history, based on the above "goodness" measures. This function has to be agreed on 
by experts. F can be a "fuzzy" function (i.e., a set of inference rules). It means that F 
can be defined as follows: 

if iS/isxxx and S^i 'x^yyy and r/iszzz and ... then fisqqq... 

where xxx, yyy, zzz, and qqq are properly defined fuzzy sets. Then, the vector 
[fi , ... , fn] can be obtained by applying one of the well-known defuzzification 
methods. Of course, the rules defining F do not have to contain all the parameters 
(i.e., "goodness" measures) as their antecedents. By selectively fine-tuning the rules, 
experts can concentrate on one of the outputs (i.e., oil, water, or gas) of the simulator. 

We define the adjustment vector for well /: Af = F{Si,S^,Sg,S^ g,ri,r^,rg) x Ef. 

Finally, total adjustment: A = X/ Af and =X+A. 



4.3 Neural Networks: Modeling 



The history matching problem becomes extremely difficult when one attempts to 
solve it in full scale, i.e. covering all wells in the reservoir during its production 
history. 

Due to the secondary input parameters (such as relative permeability, saturation, 
etc.) and the number of layers that must be taken into consideration to obtain good 
matches, the exponentially increasing problem size poses a tremendous challenge to 
simulation engineers. In such cases, we first use the fuzzy controller to find a set of 
suitable values for the secondary input parameters for MASTER to generate a series of 
production curves whose range covers the actual history curve. Then the curves are 
properly sampled to obtain a training set for a multilayer neural network. After the 
network is adequately trained, it is used to predict the set of values that when input to 



^ Let us denote A-B = X/ {a-b^ and {A-Bf = X/ {a-bif. Also, let cc{A,B) denote the 
correlation coefficient of A and B. 




628 G. Janoski et al. 



MASTER produces a reasonably well-matched output. The method used here is 
similar to training neural networks to solve the inverse problem (Sung et al., 1999). 



5 Conclusions 

In this paper we have presented a new approach for performing history matching 
which is an important and challenging problem in petroleum reservoir simulation. 
Our approach address two issues that have precluded the widespread use of reservoir 
simulators: the need for expensive, high-performance computers; and the need for 
intensive human intervention in parameter adjustment. The fuzzy controller method to 
parameter adjustment will eliminate the need for manual adjustment by human 
experts; and the MASTER Web provides an easy, effective way for parallelizing the 
history matching process on a cluster of PCs; thus achieving a significant practical 
speedup compared to high performance computers. Consequently our proposed 
approach will be more attractive to oil producers, specifically the smaller ones. 

The preliminary results that have been obtained are highly encouraging and 
indicate the great potential of our approach and future research. The general fuzzy 
control algorithm and neural network modeling, described in a high-level fashion, is 
an ongoing effort of the project. Other methods involving fuzzy logic, neural 
network, and genetic algorithms based methods are also being explored to solve this 
optimization problem. Various experimental results are placed in our website 
http ://baervan.nmt.edu/~co2/proj ect/. 

We would also like to gratefully acknowledge the support for this research 
received from the U.S. Department of Energy, Sandia National Laboratories, and the 
State of New Mexico. 



References (Partial List) 

1. Ammer, J.R., Brummert, A.C., and Sams, W.N., "Miscible Applied Simulation 
Techniques for Energy Recovery - Version 2.0," Report to U.S. DOE, Contract 
No. DOE/BC-91/2/SP, February 1991. 

2. Chang, S. -H. and Grigg, R. B., "History Matching and Modeling the C02- 
Foam Pilot Test at EVGSAU," paper SPE 39793 presented at the 1998 SPE 
Permian Basin Oil and Gas Recovery Conference, Midland, TX. 

3. Jang, J.-S. R., Sun, C.-T., and Mizutani, E., Neural-Fuzzy and Soft Computing, 
Prentice-Hall, 1997. 

4. Klir, G.J and Yuan, B., Fuzzy Sets and Fuzzy Logic, Theory and Applications, 
Prentice-Hall, 1995. 

5. Sung, A.H., H.J. Li, S.H. Chang, and R.B. Grigg, "Solving Nonlinear 
Engineering Problems with The Aid of Neural Networks," Proceedings of SPIE, 
Vol. 3812 (Applications and Science of Neural Networks, Fuzzy Systems, and 
Evolutionary Computation II), 1999, pp. 188-198. 




Forest Ecosystem Management via the NED Intelligent 
Information System 



W.D. Potter^ X. Deng\ S. Somasekar\ S. Liu\ H.M. Rauscher^, and 
S. Thomasma^ 



^ Artificial Intelligence Center, GSRC 111, University of Georgia, Athens, GA 
potter@cs . uga . edu 

^ USDA Forest Serviee, Bent Creek Experimental Forest, Asheville, NC 



Abstract. We view an Intelligent Information System (IIS) as eomposed of a 
unified knowledge base, database, and model base. This allows an IIS to 
provide responses to user queries regardless of whether the query proeess 
involves a data retrieval, an inferenee, a eomputational method, a problem 
solving module, or some eombination of these. The unified integration of these 
eomponents in a distributed environment for forest eeosystem management is 
the foeus of our eontinuing researeh. 



1 Introduction 

In the past decade, organizations have been moving mainframe -based systems toward 
open, distributed computing environments. The demand for interoperability has been 
driven by the accelerated construction of large-scale distributed systems for 
operational use and by increasing use of the Internet [3]. Distributed computing 
offers many advantages, including location transparency to users, scalability, fault 
tolerance, load balancing and resource sharing. As such, much of the interoperability 
literature has been concerned with distributed computing; for example, the recent 
emergence of Java, the Object Management Group's CORE A (Common Object 
Request Broker Architecture) and Microsoft's DCOM (Distributed Component Object 
Model) are all for this purpose [1], [11]. In addition, object orientation (00) is 
probably the most widely used approach in software development and the basis for 
the CORBA and DCOM interoperability architectures. 00 makes it easier to 
maintain software modules, and makes it possible to re-use existing software objects. 
Consequently, platform independence, as well as language independence, has been a 
major focus in these interoperability architectures. 

The interoperability architecture for our unified, distributed knowledge/data/model 
management approach is based on a combination of DCOM and Active KDL (Active 
Knowledge/Data Language). DCOM provides the middleware protocol necessary to 
handle distributed interaction among our forest ecosystem management applications. 
It is a built-in component of the Microsoft NT and Windows 98 operating systems. 
Active KDL follows the functional and object-oriented paradigms [8], [9]. Active 
KDL is based on the hyper-semantic data model, KDM (Knowledge/Data Model) 
developed in the mid-1980's [12]. By hyper-semantic, we mean a data (information) 
model capable of capturing even more of the meaning of an application area than 
captured via traditional, semantic, or object-oriented models [13]. 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 629-638, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




630 W.D. Potter et al. 



A typical forest ecosystem management decision support system (FEM-DSS) 
contains a user interface, database, geographical information system (GIS), possibly a 
knowledge base, simulation and optimization models, help/hypertext management, 
data visualization, and decision methods [16]. Recent reviews have identified that at 
least 30 FEM-DSSs have been developed for use in the United States [10]; [16]. 
Different FEM-DSSs support different parts of the ecosystem management process. 
Functional service modules, also known as problem solving modules (PSM), provide 
specialized support for one or a few phases of the forest ecosystem management 
process. Full service FEM-DSSs on the other hand, attempt to be comprehensive by 
offering support for the complete forest ecosystem management process [16]. 

Most FEM-DSSs were developed independently of one another. As a result, they 
are typically large, monolithic, stand-alone systems. Although, collectively, the 
existing FEM-DSS are capable of addressing the full range of support required for the 
management of a complex forest ecosystem no one system has been found to be 
completely satisfactory [10], [16]. An ideal FEM system, therefore, requires many of 
the available DSSs working together. This necessitates both functional service and 
full-service systems integration. Besides, it is often more cost-effective to re-use 
existing software than to develop custom software when an existing FEM-DSS is to 
be enhanced to provide additional services. 

To achieve integrated operation of FEM-DSS, it is necessary to overcome the 
problems presented by the variety of platforms these legacy systems run on and the 
heterogeneity of their development environments. Existing FEM-DSS have been 
written in different software languages, they reside on different hardware platforms, 
they have different data access mechanisms, and different component/module 
interfaces. To date, efforts to achieve interoperability between FEM-DSS modules 
have used ad hoc techniques yielding unique, point-to-point custom solutions. While 
such unique solutions work, they are typically very difficult to maintain and extend by 
other developers due to their idiosyncratic nature. No comprehensive, theory-based 
interoperability standard currently exists for achieving integrated operations of FEM- 
DSSs [16]. 

A cooperative research program between scientists of the Artificial Intelligence 
Center at the University of Georgia and the USD A Forest Service is currently 
underway to develop an Intelligent Information System (IIS) component for NED, a 
forest ecosystem management Decision Support System [20]. NED is a 
comprehensive full service FEM-DSS. It supports knowledge, data, and model 
management in a distributed environment. The immediate goal is the seamless 
integration of several existing, loosely coupled legacy USD A Forest Service systems 
within the NED architecture. Additional systems will be added at a later time using 
the interoperability standard designed and tested in this initial effort. 

In this paper, we focus on three important facets of NED. The first facet deals 
with how a forest manager would interact with our system. A manager interfaces 
with it via a standard client process that may provide knowledgeable assistance. The 
knowledge driving the interface and controlling the decision models is maintained 
within the interface controller. The interface controller is based on the notion of 
query driven processing. That is, a user specifies a query to the NED system and the 
interface controller determines how best to respond to the query. The response may 
entail a variety of events taking place. Providing the hand-shaking architectural 




Forest Ecosystem Management via the NED Intelligent Information System 63 1 



support for our distributed environment is the second facet. We use Microsoft's 
Distributed Component Object Model (DCOM) to facilitate the distributed integration 
of the decision model components [6], [7]. The integrated decision model 
components of our intelligent information system prototype (the third facet) are FVS, 
SILVAH, FIBER, and NITROGEN. FVS is a Forest Vegetation Simulator that 
projects the growth of forest stands under a variety of conditions [19]. SILVAH [5] is 
the SILViculture of Allegheny Hardwoods prescription system that aids forest 
managers in making treatment decisions. FIBER [18] is a forest stand growth 
projection system that deals with interactions of a variety of tree species in the stand. 
NITROGEN is a nitrogen recycling simulation package currently under development. 
It is designed to predict nitrogen flow through various “pools” within the forest (e.g., 
soil, animal, ground litter, tree, and air. 

The organization of this paper follows the order of the three NED facets. We 
discuss the approach we take for providing distributed interoperable functionality. 
We briefly discuss the “middleware” layer of our approach (i.e., DCOM). Finally, we 
provide an overview of the integrated decision model components and our interface 
controller prototype. 



2 Conceptual Architecture 

Before discussing the details of Active KDL, we want to define our view of 
interoperability. One solution to many of the problems in systems integration is 
interoperability of software systems [15], [2]. Interoperability is the ability for two or 
more software components to cooperate by exchanging services and data with one 
another, despite the possible heterogeneity in their language, interface and hardware 
platform [17], [21]. Interoperable systems provide a software standard that promotes 
communication between components, and provides for the integration of legacy and 
newly developed components. 

The Active KDL Knowledge Data Base System [8], [9] is capable of representing 
information in different forms: Stored Data, Rules, Constraints, Models, and Problem 
Solving Modules. These forms allow information to be retrieved, derived, checked, 
generated, and produced respectively. From an Active KDL user's point of view, 
queries may be answered by simple data retrieval, complex query processing, 
querying requiring heuristic/problem solving knowledge, model instantiation, or 
module instantiation. 

Model instantiation may occur when Active KDL does not have sufficient data or 
knowledge to provide a satisfactory user response by other means. In such a case. 
Active KDL automatically creates model instances that are executed to generate 
enough data to give a satisfactory reply to the user. Depending on the complexity of 
the query, model instantiation may be a simple or quite complex process. The process 
centers on the creation of sets of input parameter values that are obtained by schema 
and query analysis. Model instantiation has the potential to require an enormous 
amount of computation in response to a query. Therefore, control heuristics (explicit 
meta-knowledge) must be provided to control the amount of computation. Module 
instantiation is very similar to model instantiation except that it uses the query 
specific parameters to identify, instantiate, and start the execution of a problem- 




632 W.D. Potter et al. 



solving module. In the event that some aspect of the problem is unavailable (or not 
derivable) from the NED-IIS, the user will be prompted for the information. 

An example of a problem-solving module is a genetic algorithm-based diagnosis 
system [14]. This type of PSM would be used to determine the most likely set of 
disorders that best explains a set of symptoms. The input needed is the set of 
symptoms that indicates that the forest under consideration is not achieving its 
specified goals. The output includes the diagnosis or set of forest components that are 
causing the problem(s). The domain knowledge used to guide the heuristic search for 
the solution would be acquired and placed within easy access of the diagnosis module 
in the IIS. 

In our IIS approach, a problem-solving module is invoked in much the same way 
that a model is invoked. That is, whenever a user query is presented to the IIS where 
the other forms of response processing fail to produce results a PSM may provide the 
proper response. The IIS meta-knowledge that deals with preparing the plan of action 
to be taken by the query processor uses its available meta-knowledge to determine the 
appropriate response path. After determining that a PSM is appropriate, certain 
parameters are taken from the query specification (as in query driven simulation), as 
well as from the information content of the IIS (possibly via the application of meta- 
knowledge). These items are used to instantiate the selected PSM. 

An important feature of the NED-IIS user interface is the recognition of 
unavailable (and necessary) parameter values. The interface would prompt the user 
for these values and save them for later use if necessary. For example, if a stand's 
information has been specified and certain field test results have been given to the 
NED-IIS, the stand manager would be in a position to query for the diagnosis (the 
collection of disorders that were causing the stand to deteriorate). A diagnostic 
problem-solving module defined for the domain would be invoked to respond to the 
query. Critical information that was unavailable to the diagnostic module would need 
to be provided before the module could be executed. The user would be informed of 
the missing items and prompted for their values. 

The decision making process on how best to answer a query is divided into two 
phases: Strategic Planning and Tactical Planning. The strategic planner is responsible 
for making such decisions as how many answers to give, and on what combination of 
bases (knowledge, data, model/module) to access, for example. Tactical planning 
involves detailed decision making on how to achieve a specific, concrete goal. The 
techniques used are algorithmic, although in many cases the algorithms will need to 
be heuristic (e.g., query optimization is NP-Hard). For query optimization while 
accessing a database, a detailed plan is produced indicating what operations (e.g., 
join), access paths, and indices are to be used to retrieve the appropriate data. For rule 
selection, given a query that requires new data to be inferred, rules will be 
selected/indexed based upon their relevancy, and inference will be carried out by 
forward and/or backward chaining depending on the situation (a combined 
forward/backward approach is sometimes appropriate). 

3 DCOM - The Middleware 

Based on an evaluation of a number of systems, and because most existing forest 
decision support systems run on Microsoft Windows platforms, we selected a 




Forest Ecosystem Management via the NED Intelligent Information System 633 



DCOM-based framework for the integration of forest decision support 
applications [2]. An earlier prototype using NED and FVS as example applications 
demonstrated the effectiveness and appropriateness of integrating legacy and newly 
developed applications using DCOM. The implementation also indicated that, based 
on our previous experience with CORE A [4], DCOM programming is much easier 
and more productive. This is because we only focus on the application-specific 
implementation while the framework does many routine tasks, for example, 
generating the templates necessary for creating DCOM objects and registering the 
applications. An additional advantage with DCOM is that we can develop user- 
friendly interfaces using Microsoft resources. The difficulty with DCOM is that, like 
CORBA, DCOM is a long and complicated specification that takes a good deal of 
time to master. 




Figure 1 NED Controller with DCOM-based interoperable architecture 



Conceptually, the general structure of our DCOM-based framework for integration 
has three major components: the caller, the controller that has DCOM as its 
middleware, and the applications (see Figure 1). A caller is an entity that issues a 
request to an application via the controller, and usually acts as an interface between 
the entire integrated system and the user. This interface is visual and functions 
differently than those of application objects in the system. It does not have a 
corresponding implemented object. Rather, it only gives a “look and feel” of the 
integrated system. The caller can interact with one or more applications to 
accomplish its work. 

The controller (also called the intelligent information server) is responsible for 
locating and activating applications. More importantly, it controls interactions 
between the caller and an application, and between applications. The controller uses 
DCOM as its backbone, since DCOM provides many system services that facilitate 
the registration and finding of application components, and the control of and 












634 W.D. Potter et al. 



communications between them. While running an application, the controller has the 
duty of managing the dialog with the user, for example, screen handling, data entry 
and validation, dialog box control, and menu interpretation. Sometimes, the 
controller may have the duty to display to the user the results passed to it by an 
application. Adding a new application to or replacing an existing one in the 
integrated system has minimal effect on how the framework looks to the user due to 
its “plug-and-play” nature. Depending on the complexity and need of the integrated 
system, the controller may also contain processing rules that help interpret the 
requests and instructions supplied by the user. Therefore, only through the controller 
can the applications participate in a coordinated fashion with the integrated 
information system. 

An application is a component that provides services to the integrated system. 
Many of the forest decision support applications focus on single simulation, display, 
input/output, or analysis tasks. Each application is encapsulated within an interface 
that follows a standard format (we use MIDL the Microsoft Interface Definition 
Language). This approach makes it possible for the application to communicate with 
the rest of the framework, such that other applications can use the interface to access 
the services this application provides. Interfacing also provides an effective way to 
deal with legacy applications. Many legacy applications were developed with a 
stand-alone purpose. Their data and functionality may not be readily available to 
other applications; the APIs of those legacy applications may be proprietary, limited, 
or even lacking. Newly constructed interfaces to the legacy applications act like 
adapters so that these legacy applications and the rest of the framework can work 
together, hence enabling re-use of existing applications. 

The architectural design should be general purpose, meaning that the framework 
should have distributed processing capability and provide platform independence so 
that it is ready to work in heterogeneous, cross-network environments if it is required 
to do so in the future. Overall, the design is general and makes no assumptions about 
the software applications to be integrated. Its standardized interface scheme enables 
integration of a variety of applications. It is an open framework in the sense that 
application components can be added and/or removed easily without drastically 
affecting the functionality of the whole system. The adoption of DCOM as the 
middleware supports this design. 

4 The NED-IIS Prototype 

Currently, we are integrating three legacy forest service applications, namely Forest 
Vegetation Simulator (FVS), FIBER, and SILVAH using the DCOM-based 
framework. The next application to be included is the NITROGEN simulation. The 
following paragraphs discuss these applications in detail. 

FVS is a system that uses common forest inventory information and a 
growth/yield model for projecting the growth of forest stands. FVS simulates growth 
and yield for major forest species, forest types, stand conditions and a wide range of 
silvicultural treatments. It is used in the U.S. forest management field and there are 
currently more than 20 variants for 20 different geographic regions covering much of 
the commercial forestland in the U.S. Numerous post processors have been 
developed for FVS. Post processors are independently developed computer programs 




Forest Ecosystem Management via the NED Intelligent Information System 635 



that can be used along with FVS to further process the FVS simulation results for 
specific analysis needs. For example, the Average Summary Table post processor 
calculates an average summary table from the stand output of FVS. Typically, the 
simulation results from the FVS are used as input to the post processors that perform 
additional analysis. This feature makes the FVS a very useful and versatile tool 
because virtually limitless applications can be developed in the form of post 
processors without major modifications to the core FVS system. FVS is written in 
Fortran-77 and runs on PC and UNIX workstations [19]. 

FIBER is a forest growth model that predicts the growth interactions among 
species of the Northeastern United States over a complete range of forest treatments 
(clear cutting to unmanaged stands), stand densities, harvest intervals, species 
composition, and different ecological land classifications. The user can use the 
software to predict the growth and yield over a specified time interval for individual 
forest stands or large forested areas. FIBER can be used for both even-age stand and 
multi-age stand management [18]. 

SILVAH is a forest prescription model used for prescribing silvicultural 
treatments for Hardwood stands of the Alleghenies. It identifies important factors and 
determines how they function in regulating regeneration or stand growth. SILVAH 
also develops objective guidelines and prescribes optimal silvicultural treatments to 
achieve management goals. These guidelines have been integrated into a complete 
stand analysis and prescription procedure that provides a systematic way of measuring 
and evaluating critical stand conditions. This data is then used to arrive at a 
recommended treatment. The stand inventory and site factors are summarized and 
analyzed to evaluate the stand's potential growth and regeneration. Then, decision 
tables are used to determine proper prescription procedure based on the critical levels 
of the various factors in combination with landowner objectives [5]. 

NITROGEN is the nitrogen recycling simulation systems designed to predict the 
values of various nitrogen pools in northern hardwood forests by synthesizing the 
existing nitrogen recycling models. The system will also be used to examine the 
effects of various combinations of site quality, stand structure (density and species 
composition), and silvicultural treatments on the size of the nitrogen pools. The 
inputs required by the system include the stand quality (site index), species list, tree 
size, and any treatments that may have been applied to the forest. The output of the 
system is the value of available soil nitrogen that will be further used as an index in 
comparing different treatment scenarios. Keep in mind that NITROGEN is currently 
under development and is not yet integrated within our NED-IIS architecture. 

The communication between these applications and the client interface is through 
the intelligent information server or the controller. The design makes no assumptions 
about the locations of the applications. The applications may reside on the local 
machine or on a remote machine. The controller is responsible for locating and 
activating the application whether it resides locally or remotely. The controller also 
contains processing rules that help interpret the user's needs. These processing rules 
determine the kind of information that must be collected from the user. Thus there is 
continuous interaction between the caller and the controller. The processing rules 
also determine which application to activate. 

For example, if the user wants to have an assessment of bark beetle risk factors in 
a forest stand, the controller will decide that FVS is the right application to be used. It 




636 W.D. Potter et al. 



will also determine the right post processor to be used and guide the user through a 
visual interface to supply appropriate information that is needed to run FVS with this 
particular post processor. If the user wants to know the appropriate silvicultural 
treatments to achieve a certain management goal, the controller will decide to invoke 
SILVAH. If more than one application needs to be activated, the processing rules 
will determine the dependency between the applications and activate the applications 
in the right order. Under such a scenario, the output from one application may be 
needed as input to another application. The communication between applications is 
also routed through the controller. The controller activates the first application, and 
after receiving the output it may modify the output to achieve an acceptable input 




Figure 2 HcwQiartofNEDGbrtroUa'faFVS/^Hc^ 

format for the second application. It then invokes the second application with this 
input. The controller also processes all outputs from the applications. 

Consider the FVS example where the controller determines that FVS needs to be 
used to satisfy the user’s query (e.g., in a nitrogen treatment query, a treatment is 
applied to the forest, FVS is run to predict forest growth, another treatment is applied, 
FVS is run again, and then an analysis of the nitrogen content is provided to the user 
as the query result). This example requires the controller to call upon its FVS 
knowledge in order to activate the simulation properly. That is, in order to run FVS, 
several steps are required (see the organization in Figure 2). The first step requires 
the forest data to be converted to a database format (this is in anticipation of a revised 
version of NED to be available in the near future), the stand data extracted from the 
database in the FVS format, the FVS pre-processor (called Suppose) run to set various 
FVS parameters, the FVS input data and parameter files sent to the FVS server (either 
on the local machine or a remote machine), FVS run, output data returned to the 
controller and converted to the database format, and finally the data extracted from 
the database and converted back into the NED format. Once NED has the appropriate 
data available the controller will invoke other rules to determine what next to do with 












Forest Ecosystem Management via the NED Intelligent Information System 637 



it, such as send it to another module or inform the user. With the NITROGEN 
scenario, the controller would continue to process the sequence of events to 
eventually get to the nitrogen analysis module. Note that the controller would have 
previously determined the whole sequence of steps using built-in planning knowledge 
of how to satisfy various user queries. The controller acts as the main communication 
and knowledge junction. 

It is necessary to route all communications through the controller because the 
entire integrated system is composed of independent heterogeneous applications. We 
can make no assumptions about the language, data types and functionality of the 
individual applications. Each application participates in the integrated system through 
its wrapper. The wrapper around the application provides an interface to the 
application in a standard format. The controller accesses the application through the 
interface provided by its wrapper. New interfaces permit legacy applications and the 
rest of the framework to integrate seamlessly and work together. Addition of other 
applications can be done without affecting the existing integrated system. 

5 Conclusions 

We have designed a DCOM based framework for integration of legacy and future 
forest decision support applications. The design makes no assumptions about the 
individual software applications and is therefore a general model that will permit 
seamless integration of other legacy applications and future applications. The 
adoption of DCOM as the middleware provides the model with the capacity to run 
remote applications. From the client's point of view, the location of the application is 
not an issue. The application can be run locally or remotely as needed. We have built 
a visual interface common to our three sample applications that is capable of locating 
and running FVS, FIBER or SILVAH on remote or local machines. This has been 
achieved by building wrappers around each of these applications. The caller is able to 
access the functionality of the applications through these wrappers. We are currently 
working on the IIS component of the system, by adding rules to the controller. In 
addition, we are adding new applications, for example the NITROGEN system. 

References 

1. Grimes, R. 1997. Professional DCOM Programming. Birmingham, U.K: Wrox 
Press Ltd. 565 p. 

2. Liu, S. 1998. Integration of Forest Decision Support Systems: A Search for 
Interoperability. Master's Thesis. Athens, GA: The University of Georgia. 122 p. 

3. Manola, F. 1995. Interoperability issues in large-scale distributed object 
systems. ACM Computing Surveys 27(2): 268-270. 

4. Maheshwari, S.S. 1997. A CORBA and Java Based Object Framework for 
Integration of Heterogeneous Systems. Master’s Thesis. Athens, GA: UGA 92 p. 

5. Marquis D.A., R.L. Ernst, S.L. Stout. 1992. Prescribing Silvicultural Treatments 
in Hardwood Stands of the Alleghenies (Revised). USD A, Gen. Tech. Report 
NE-96. 




638 W.D. Potter et al. 



6. Microsoft. 1997. Distributed Component Object Model Protocol - DCOM 1.0. 

7. Microsoft. 1998. DCOM: A business overview. 

8. Miller, J.A., K.J. Kochut, W.D. Potter, E. Ucar, and A. A. Keskin. 1991a. Query 
Driven Simulation in Active KDL: A Functional Object-Oriented Database 
System. International Journal in Computer Simulation. 1,1, pp.1-30. 

9. Miller, J.A., W.D. Potter, K.J. Kochut, A.A. Keskin, and E. Ucar. 1991b. The 
Active KDL Object-Oriented Database System and Its Application to 
Simulation Support. J. of 0-0 Programming - Special Issue on Databases. 4, 
pp. 30-45. 

10. Mowrer, H. T., K. Barber, J. Campbell, N. Crookston, C. Dahms, J. Day, J. 
Laacke, J. Merzenich, S. Mighton, M. Rauscher, K. Reynolds, J. Thompson, P. 
Trenchi, and M. Twery. 1997. Decision Support Systems for Ecosystem 
Management: An Evaluation of Existing Systems. General Technical Report 
RM-GTR-296. Fort Collins, CO: USDA Forest Service, Rocky Mountain 
Forest and Range Experiment Station. 154 p. 

11. OMG. 1997. The common object request broker: architecture and 
specification. Version 2.1. OMG Document, Object Management Group. 

12. Potter, W.D. and L. Kerschberg. 1986. A Unified Approach to Modeling 
Knowledge and Data. Proceedings of the IFIP TC2 Conference on Knowledge 
and Data (DS-2). Pp.vl-v27. 

13. Potter, W.D., R.P. Trueblood, and C.M. Eastman. 1989. Hyper-Semantic Data 
Modeling. Data & Knowledge Engineering. 4, pp. 69-90. 

14. Potter, W.D., B.E. Tonn, M.R. Hilliard, G.E. Liepins, S.L. Purucker and R.T. 
Goeltz. 1990. Diagnosis, Parsimony, and Genetic Algorithms. Proc. of the 3rd 
Int. Conf on Industrial & Engineering Applications of AI and Expert Systems. 

pp. 1-8. 

15. Potter, W. D., T. A. Byrd, J. A. Miller, and K. J. Kochut. 1992. Extending 
decision support systems: The integration of data, knowledge, and model 
management. Annals of Operations Research 38: 501-527. 

16. Rauscher, H. M. 1999. Ecosystem management decision support for public 
forests: A review. Forest Ecology and Management 114: 173-197. 

17. Sheth, A. 1998. Changing Focus on Interoperability in Information Systems: 
From System, Syntax, Structure to Semantics. Interoperating Geographic 
Information Systems. M.F. Goodchild, et al (eds). Kluwer Pub. Co. 

18. Solomon, D.S., D.A. Herman, W.B. Leak. 1995. FIBER 3.0: An Ecological 
Growth Model for Northeastern Forest Types. USDA, Gen. Tech. Report NE- 
204. 

19. Teck, R., M. Moeur, and B. Lav. 1996. Forecasting ecosystems with the forest 
vegetation simulator. Journal of Forestry 94(12): 7-10. 

20. Twery, M.J., D.J. Bennett, R.P. Kollasch, S.A. Thomasma, S.L. Stout, J.F., 
Palmer, R.A. Hoffman, D.S. DeCalesta, J. Hornbeck, H.M. Rauscher, J. 
Steinman, E. Gustafson, G. Miller, H. Cleveland, M. Grove, B. McGuinness, N. 
Chen, and D. E. Nute. 1997. NED-1: An integrated decision support system for 
ecosystem management. Proceedings of the Resource Technology '97 Meeting, 
pp. 331-343. 

21. Wegner, P. 1996. Interoperability. ACM Computing Surveys 28(1): 285-287. 




Friendly Information Retrieval through 
Adaptive Restructuring of Information Space 



Tomoko Murakami, Ryohei Orihara, and Takehiko Yokota 



Information-Base Functions Toshiba Laboratory 
Real World Computing Partnership 
1, Komukai Toshiba-cho, Saiwai-ku, Kawasaki 212-8582, Japan 
{tomoko , orihara , takehiko }@eel . rdc . toshiba .co.jp 



Abstract. Although relevance feedback techniques are relatively com- 
mon in the field of information retrieval (IR), feedback usually supports 
a process of query refinement. Using feedback to restructure the infor- 
mation space itself has yet to be attempted. Restructuring not only sup- 
ports useful applications such as clustering, but is also indispensable 
for IR given that the modeling function employs inter-term correlation. 
This paper presents a new approach to relevance feedback involving in- 
formation space manipulation, and examines its effectiveness through a 
number of experiments. 



1 Introduction 

Global use of personal computers and rapid development of the Internet make 
it essential that electronic information is used effectively in both the social and 
economic arenas. Furthermore, it is necessary to select useful and relevant infor- 
mation from the deluge that confronts the modern computer user. 

The purpose of the research carried out in the Information-Base Functions 
(IBF) laboratory is to establish a framework for interpreting multimedia data 
from various user viewpoints and adjusting to them on a person- by-person basis. 
We have constructed a flexible cognitive framework for text information. In this 
paper we propose an adaptation of the document space by using the vector that 
represent the user’s retrieval viewpoint. Experiments have been carried out on 
large collections and processing time has been improved. 

2 Issues for IR 

When we construct a retrieval system, a fundamental but crucial problem is that 
we cannot predict the user query. From the user’s point of view, queries may 
sometimes lead to undesired documents. Relevance feedback is an approach that 
attempts to address this problem [1]. Relevance feedback by query refinement 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 639—644, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 



640 



Tomoko Murakami et al. 



that modify the elements in a query vector (a query is also represented as a 
vector) in relationship to their occurrence in positive and negative documents is 
reported in [2] . While query refinement by adding new words to a query according 
to synonyms or a conceptual dictionary are also possible, it can still be difficult 
to express the user’s interests in a query due to its overall poor information 
content [3] . 

Relevance feedback by query refinement is a local modification in the sense 
it aims to obtain desirable documents only for the current inquiry. Therefore, 
despite the assumption that the user’s query includes abbreviations or exag- 
gerations based on user’s latent semantic tendencies after a series of retrieval 
processes, the user’s viewpoint is not reused for subsequent retrieval. For these 
issues Raghavan proposed the reuse of past optimal queries [4]. Clearly it is pos- 
sible to preserve the user’s viewpoint after a series of retrieval processes by 
modifying a function to compute their similarity. We can apply this method to 
the model if we assume each term corresponds to an orthogonal axis, however, 
there are in fact difficulties due to inter-term correlation. 

Some relevance feedback approaches through restructuring the document 
space based on the co-occurrence are unfortunately can only identify positively 
related terms. Yu [5] developed a methodology that constructs term classes in 
such a way that the number of correct responses by the system is maximized. 
However, it is proved in [5] that the approach is computationally infeasible. 



3 Restructuring of Document Space 

We propose a relevance feedback method through a restructuring of document 
space to accomplish retrieval based on each user’s viewpoint. The approach 
incorporates restructuring of document space performed by optimization of a 
dictionary vector, where the modeling function employs inter-term correlation. 
Here the dictionary vector is denoted by a binary series of O’s and I’s which 
indicates those terms which are required by the document space to reflect the 
user’s interests. 

3.1 Modeling Based on Correlation of Terms 

To represent a document space we adopt the vector space model, in which each 
document in the space is represented as a vector consisting of index terms and r 
different index terms lie orthoganally to each other in a space. A document in 
the vector space is represented by a r-dimensional vector, in which each element 
corresponding an index term is weighted according to its importance for the 
document by tf-idf method [6, 7]. 

Various modeling approaches considered the Relationships between terms 
are proposed. We simply realize a modeling based on intuition that neighboring 
terms in a document can be considered mutually correlative with each other. We 
define U{q^i) as the weight of the i-th index term Wi in a document g, which is 



Friendly Information Retrieval through Adaptive Restructuring 641 



obtained by the following calculation and distributes between 0 and 1 . 

u (q, t) = . (1) 

\jY.k Ej [^/(«> 3)idf{j)M 

where represents a correlation between two index terms Wi and Wj. 

3.2 Optimization of Dictionary Vector 

We consider adaptation of the document space to the user’s mental model 
through manipulation of a dictionary vector that reflects term correlation. The 
best dictionary vector for the user is determined by a search guided by a per- 
formance measure through the dictionary vector space, which is a search tree 
defining a dictionary hierarchy that has a dictionary vector selected by the user 
as its initial node. The outline of the search algorithm is: 1) to generate can- 
didates by heuristics through the search space, 2 ) to perform depth-first search 
with backtracking until the performance of the generated dictionary vector is 
improved, 3) to terminate the search if a candidate dictionary vector is not 
generated at the initial node of the search space. 

The first task: generating candidates is implemented by assigning 0 to an 
element in the dictionary vector corresponding to a term which has not yet been 
selected, where the term is identified by the heuristic approach that the largest 
difference between an element in the query and the document ranked lowest by 
the user represents maximum user disinterest. This procedure is iterated N times 
at each node of the search space where N is given as a program parameter. An 
algorithm corresponding to the latter two tasks is presented in Table!. In order 
to measure performance of the generated dictionary vectors we use feedback 
information, which consist of positive documents ranked in order of similarity 
by the user using an arbitrary number scale. 

Let Q be a query previously generated by the user and d' be a new candidate 
generated from the current dictionary vector d during the search. To evaluate 
the performance of d' the elements in positive documents T>i , . . . , Dm denoted in 
the order in which the user ranked them are first weighted according to d' using 
the expression (1). Then the distance between Q and Di, . . . , Dm are computed, 
which can be written as distanced' {D\^ • • • 5 

distanced' {Dm^ Q)- To compare d' with d in terms of the preservation of the order 
and the adaptation of the document space to the query the following conditions 
are introduced. 

distanced' (Di,Q) ^ ^ distanced' (Dm, Q) (2) 

m 

distance d'[Di^Q) < ^ ^ di stance d{Dj,Q), (3) 

i=i 

If the distance values satisfy these conditions, it is assumed that d' is improved 
compared with d since the search tree operates to define a dictionary hierarchy. 

The evaluation of the generated dictionary vectors by these conditions is 
continued at each node in the search space until a candidate dictionary vector 
is not generated at the initial node of the search space. 



642 



Tomoko Murakami et al. 



Table 1. Dictionary vector search algorithm. 

a current dictionary vector = d 
current best dictionary vector = e 

the number of iteration times by heuristics at node d = t [d] (d<N+l) 

e := d; % initialization 

for each i do t [i] := 0. 

while d != null do begin % terminative condition 

while t [d] <N do begin 

d^ := a candidate dictionary vector 
from d. 

d := a parent node of d’ . 

if d’ has not selected in the past 

then 

if d’ satisfies the conditions 
then 

if d’ is superior to e then 
e := d’ . 

d : = d’ ; break, 
else 

t [d] := t[d]+l. 

else 

t [d] := t[d]+l. 

end 

if t [d] ==N then 

d := a parent node of d. 

end. 

4 Retrieval from FAQ 

We implemented a retrieval system and a tool to visualize the document space 
on a Windows NT platform using C + +. This system operates in two phases. 
The first phase performs initial retrieval according to inputs given by a user: 
a query, a dictionary and how many retrieved documents should be displayed. 
Then the system allows the user to rank positive documents with an arbitrary 
number. Given the feedback information in the second phase the system first 
searches through the dictionary vector space to optimize the user’s dictionary, 
and then retrieves again according to the dictionary thus obtained. 

We examined whether our methods are effective on real documents databases 
by experimenting with FAQ (Frequently Asked Questions) documents extracted 
from a yC++ mailing list archive which consist of 11361 messages including 3216 
question documents. 8363 index terms are selected from all the documents. The 
user who had thorough knowledge of yC + + was selected for this experiment. 
A model answer for the retrieval consists of queries and documents that are 
extracted from the FAQ documents and are assessed as relevant to the query 
by the intended user. To evaluate the retrieval effectiveness of our methods we 
utilize the familiar measures of precision and recall. 



Friendly Information Retrieval through Adaptive Restructuring 643 




Evaluation Value (N=3) 

2.34000 

2.32000 

2.30000 

2.28000 

2.26000 

2.24000 

2.22000 

1 30 50 88 1 1 7 1 46 1 75 204 233 262 291 320 340 378 407 436 




Dictionary 
Vector's ID 



Fig. 1. Transition of best dictionary vector through the search space with N = 2 
and TV = 3 is shown respectively and right). Dictionary vectors that satisfies 
conditions (2), (3) are assigned to x — axis in order of generation and y — axis 
gives performance measure calculated in (3). 



Table 2. Precision and recall of an initial dictionary and an optimized dictionary. 





initial dictionary 


optimized dictionary 




- 


N = 2 


N = 3 


precision 


50.0% 


75.0% 


100.0% 


recall 


66.6% 


100.0% 


100.0% 



Our experiments aimed to clarify the following three issues in comparison 
with more conventional methods. 

[ experiment! : effectiveness of modeling approach ] 

To evaluate our modeling method we compared the difference of the retrieval 
result between the tf—idf method and a modeling approach that took correlation 
of terms into account. The experiment was done with an initial dictionary vector 
that consisted of all terms existing in document space. The result shows the 
advantage of our modeling method for most of the queries. 

[ experiment2 : effectiveness of relevance feedback ] 

We examined the effectiveness of relevance feedback by restructuring the 
document space through optimization of a dictionary vector. Using our modeling 
method that takes correlation of terms into account, we compare the difference 
between retrieval results according to the initial dictionary vector and the search- 
optimized method. 

The performance of the search in this experiment with N = 2 or N = 3 
is shown in Figure!. According to figure!, with TV = 2, the 28-th generated 
dictionary vector is finally determined as the best dictionary through a search 
tree which has 42 nodes for the whole tree and !3 nodes in depth. In the same way 
with TV = 3, the 222-th generated dictionary vector is finally determined as the 
best. Precision and recall of retrieval with the optimized dictionary are shown 
in table2. The result shows relevance feedback by restructuring of document 
space by optimization of dictionary vector is effective for the particular query 
in question. Table2 also indicates that even the best dictionary vector generated 
through a not so large search space with TV = 2 performs significantly better. 

[ experiment3 : Preservation of user’s personal bias ] 



644 



Tomoko Murakami et al. 



We examined how well the user’s personal bias affected subsequent retrieval. 
Experimental retrieval was performed using the dictionary vector obtained in 
the second experiment (TV = 3). We used four queries and compared the result 
to the former retrieval result with an initial dictionary. As a result both Precision 
and recall increased from 0.666 to 1.000 for one query and stayed as they were 
for the others. Although the result shows that the user’s viewpoint was preserved 
after a series of retrieval processes and can affect subsequent retrieval, we require 
further experiments to evaluate our method. 

5 Conclusion 

In this paper we proposed a new method of employing relevance feedback through 
the restructuring of document space, where the modeling function employs a cor- 
relation of terms. Our method adapted the document space to the user’s mental 
model by manipulating a dictionary vector. As a consequence, the user’s view- 
point was preserved after a series of retrieval processes and applied to subsequent 
retrievals. 

We have constructed a flexible perceptive and cognitive framework for inter- 
preting text data from users’ various viewpoints and controlling them for each 
person. In the future we hope to carry out experimental studies using a test 
collection for IR systems, as well as experimental studies that include evaluation 
of the effects of preserving user bias and performance when feedback is scarce. 
Our system also requires to achieve powerful retrieval by speeding up the cor- 
relation of terms calculation. We have implemented an algorithm to minimize a 
processing time for the calculation of the correlation between index terms. 

References 

1. Rocchio, J, J.:Relevance feedback in information retrieval. The Smart Re- 
trieval System -Experiments in Automatic Document Processing. Prentice-Hall. Inc 
(1971)313-323 639 

2. Buckley, C. and Salton, G.iOptimization of relevance feedback weights. Proc.of 
ACM SIGIR Conference on Research and Development in Information Retrieval 
(1995)351-357 640 

3. Harman, D.iRelevance feedback revisited. Proc.of ACM SIGIR Conference on Re- 
search and Development in Information Retrieval(1992)l-10 640 

4. V, V, Raghavan.and H, Sever. :On the reuse of past optimal queries. Proc.of ACM 
SIGIR. (1995)344-350 640 

5. C, T, Yu.: A Formal Construction of Term Classes. Journal of the ACM. (1975)17-37 
640 

6. Salton, G., Allan, J. and Buckley, C.: Automatic Structuring and Retrieval of Large 
Text Files. Communications of the ACM, Vol. 37. (1994)97-108 640 

7. Salton, G. and Buckley, C.: Term- Weighting Approaches In Automatic Text re- 
trieval. Information Processing & Management, Vol. 24. (1988)515-523 640 



A Smart Pointer Technique for Distributed 
Spatial Databases 



Orlando Karam^, Fred Petry^, and Kevin Shaw^ 

^ Wofford College 
kar amoaOwof f ord . edu 
^ Tulane University 
f ep@eecs . tulane . edu 
^ NRL-SSC 
shawOnrlssc . navy . mil 



Abstract. We are developing a distributed object oriented spatial 
database prototype. We are extending the GIDB prototype (developed at 
NRL for using and querying OVPF databases) to support distribution. 
GIDB is a Geographical Information System coupled with an object- 
oriented spatial database, which is implemented using the GemStone 
Object Database. We are currently exploring alternatives to add distri- 
bution to the prototype. 

We tried GemEnterprise as our distributed OODB, but needed to add a 
different model of distribution to it. In order to do so, we used a smart 
pointer technique, taking advantage of the dynamic capabilities of the 
SmallTalk language. 

Also, we did several experiments comparing performance of returning a 
smart pointer from a remote method, or returning a copy of the remote 
object. The smart pointer usually is smaller, so the transmission cost is 
smaller, but future messages have to be done remotely instead of locally. 



1 Smart Pointers 

A very common technique in Object Oriented programming is to define a smart 
pointer class, which behaves just like a pointer to a specific object. In C++ this 
is usually done by overloading the operator In SmallTalk, there are no explicit 
pointers, although each object has an associated identity, which is known as an 
object oriented pointer. There are standard ways to get the oop of an object and 
(non-standard) ways to get the object back from the oop. 

In SmallTalk typing is dynamic, and message passing can be dynamically 
done too, which allows us to do this much more easily. We can call the per- 
form: with: method on an object, with the name of the message and the argu- 
ments meant for it, and the corresponding method will be executed. 

If we want something done whenever an object does not understand a message 
(that is, there is no method that corresponds to that message), we just need to 
redefine the doesNotUnderstand method, which is called whenever a message is 
sent to an object and the object doesn’t have a method for that message. This 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 645-650, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



646 



Orlando Karam et al. 



method gets as arguments the name (as a symbol) of the message that was sent, 
and the arguments that were meant to that message, and can then execute any 
code, and the value returned will be returned as if that is the value returned from 
the original method call. This is a well known technique in SmallTalk circles. 

GemEnterprise offers the capability of remote string execution, but not real 
remote method calling nor closures. We wanted to implement remote method 
calling, and in order to do so, we adapted the Smart Pointer technique. 

We redefine the doesNotUnderstand method, and when we receive that mes- 
sage we create a string that, when executed remotely will make the remote object 
execute the message. We have to essentially marshall everything and transform 
the arguments into the string. We did this by calling the passivate method on 
each of the arguments to the message. 

Notice that a copy of the arguments is what is sent to the remote machine, 
so if the message modifies any of its arguments, the local objects WILL NOT get 
modified. It would have been nice if we could send ’remote pointers’ to the local 
objects, so we get the ’right’ semantics, however a limitation in GemEnterprise 
didn’t allow us to do so. Also, blocks can not be passed as arguments (this is a 
relatively simple thing to implement, since we have documentation about where 
are all the pointers in the block; however it wasn’t important for what we wanted, 
so we didn’t implement it). 

The dynamic message calling capabilities of SmallTalk allow us to do this 
with a minimum of effort, and in a very general way. Had we been doing this 
in C++, for example, we would either had to define a different smart pointer 
for each public interface (usually each class) , and have that one define the same 
interface as that class, or overload the operator and return a pointer to a 
local object, which would force us to bring a copy of the remote object locally. 

Also the dynamic typing allows us to define a smart pointer class that can 
point to objects of any type, and make all the required behavior dynamic. 

2 Our Distributed Database (GemEnterprise) 

Eor our prototype, we used GemEnterprise, which adds distribution capabilities 
to the GemStone object database. The distribution model supported by Ge- 
mEnterprise involves making a copy of the remote object on the local machine 
and sending messages to the local copy. The copies can later be synchronized 
explicitly (they are NOT automatically synchronized). We wanted the ability to 
send messages directly to remote objects, without making a local copy, and have 
the execution of the corresponding method on the remote machine. 

In the GemEnterprise model, a machine defines remote object spaces, (ROS), 
that contain objects stored in remote repositories (that is, other GemStone 
servers). Through the ROS you can access objects in the remote repository by 
remote string execution, or, once you have an object from that repository, by 
sending messages to that object and getting other objects from there. There is 
a big limitation here, since once you get an object from an ROS, a local copy of 
it is made automatically, and all messages are executed on the local copy (so if. 



A Smart Pointer Technique for Distributed Spatial Databases 647 



for example, you want to execute a query on a collection, the collection with all 
its members will be copied locally, and then the query executed over them). 

There is the possibility of defining different consistency policies for each re- 
mote repository that determine whether the objects gotten from each one will 
keep a link to the original remote ones (if it doesn’t, then they call the object 
copies). If a link is kept to the original object, then we have replicates. There 
can be read-only or modifiable replicates (if modifiable, then changes made on 
the local copies will be transferred to the original repository). Also, there is the 
possibility of defining how a copy of an object is made, on a class basis (that is, 
objects of class A will be copied this way etc). The definition may specify which 
instance variables to copy, and up to how many levels. For the instance variables 
that are not copied, a stub is generated that should un-stub and make a copy 
whenever a message is sent to it. 

When a local copy is updated the changes are NOT transmitted immediately 
to the remote repository; also when the remote object changes those changes are 
transmitted immediately to our machine. Instead, the ROS has to be refreshed 
in order for both machines to synchronize their copies of those objects. 

There are 3 problems (for our purposes) with this model. First, the updating 
has to be explicitly managed, second, the executions is always done in the local 
machine, and third, local copies are kept, and they consume space in our local 
database. 

3 GIDB Prototype 

GIDB is a prototype Object-Oriented GIS developed at NRL-Stennis to display 
and query databases that use the Object Vector Product Format (OVPF), which 
is an object oriented extension to the Vector Product Format, also widely used 
at NRL. In Geographical Information Systems one important concept is that 
of coverage or layer, which is a set of thematically related objects. In OVPF a 
library represents a set of coverages with data with the same scale. A database 
represents a set of libraries, which are spatially related (they contain informa- 
tion about the same area). Databases contain (among other attributes) a set of 
libraries, and libraries a set of coverages. 

GIDB was developed in Small Talk, using GemStone as the 00 Database. 
Gurrently there is also a Java/CORBA interface to it. 

We specifically wanted to devise ways to add distributed database capabilities 
to GIDB, and thought using GemEnterprise was the best way. GemEnterprises 
distribution model was not what we wanted, so we had to extend it. 

4 Experiments 

We tested the difference it would do to return either a value or another smart 
pointer. Since we had some objects that may be huge, we wanted to have the 
ability to execute the method remotely, and only return certain kinds of objects. 
To this effect, we added another variable to our smart pointer class, which we 
called level. Each time we created a new smart pointer as a result of sending a 



648 



Orlando Karam et al. 



message to another smart pointer, the level was decremented, and when it got 
to 0, a copy of the object was returned instead of a smart pointer to it. 

We performed several experiments, with different queries, to see if our hy- 
pothesis was true, and how important the effect was. 

We designed 2 experiments; in each we run a query, changing some param- 
eters, and timing the runs for performance. The experiments were performed 
at night, when there were no users, to minimize the effect of spurious network 
traffic. 

In the first experiment, we wanted to test the performance difference between 
returning a copy of the object and returning a ’remote pointer’ to it (the remote 
pointer is a class we implemented, called myForw). Depending on the size of the 
object returned, the performance can be greatly affected. We obtained results 
conforming to what we expected. 

We had a problem with backwards pointers. A common technique in OOP is 
to represent a relationship with inverse by using 2 pointers, one in each direction. 
If we want to bring the ’whole’ object, that is, the object and all other objects it 
points to (recursively), this could be a problem, since we can bring ’big’ objects 
as results of queries. This was happening in our case, since a database has a 
collection of libraries, but also the library points to its database, and so on with 
other objects. After making all those backwards pointers null, we were able to 
run the experiments. 

In the second experiment we assumed that getting the pointers isn’t a good 
idea if the objects we get are intermediate results and we want to send more 
messages to them. We supposed at some point in time the difference in network 
traffic would be offset by the increased number of remote messages (since each 
message takes some time). We were surprised to find that even one more message 
sent to the objects would offset the advantage. 

4.1 Experiment 1: Get Different Objects 

We tried 2 different queries. One query selects ALL objects in the coverage (our 
Bounding Box is the world) and one that selects none (our BB is a point). 

We report results from 10 runs each, with all coverages and levels. The re- 
sults of level, are indicated in table 1. These results were obtained during the 
day, so network load introduces a bigger variance. Average time is reported in 
milliseconds, rounded to the nearest integer. 



Level 


Comment 


0 


Brings the whole library 


1 


Brings the whole coverage 


2 


Brings a collection of the objects 


3 


Brings a collection of smart pointers 



Fig. 1. Levels and what they bring 



A Smart Pointer Technique for Distributed Spatial Databases 



649 



Coverage 


Number of objects 


Passivated size 


1 


10 


4541 


2 


48 


123407 


3 


38 


119302 


4 


62 


143223 


5 


12 


162208 


6 


444 


742901 


7 


579 


811504 


8 


18 


44352 


9 


220 


1660043 



Fig. 2. Coverages and the number of spatial objects on each 



Cov/Level 


1 


2 


3 


4 


5 


6 


7 


8 


9 


0 


150551 


154998 


155684 


153538 


153777 


153491 


155035 


155124 


154067 


1 


3962 


7381 


6520 


7924 


7655 


26599 


23450 


4938 


64292 


2 


3712 


4670 


4606 


5215 


5262 


18144 


14645 


4061 


44677 


3 


3484 


3679 


3509 


3609 


3553 


3919 


3674 


3868 


3590 



Fig. 3. Results for the query that select all objects in coverage (average of 10 
runs, in milliseconds, rounded to the nearest millisecond) 



Here the advantage of using the smart pointer is very clear, especially for 
level 0, which brings the whole library (this closely corresponds to the standard 
GemEnterprise model), even when we only wants objects in one coverage. In fact, 
it will bring the whole library even when the query returns no object (table 4). 

In level 1 (which brings the whole coverage) we can still notice the effect 
of transmitting the whole coverage even if we only need some parts of it. For 
levels 2 and 3, the variation on the network load makes more difference than the 
number of objects sent or the amount of data transmitted. 



4.2 Experiment 2: Intermediate Results. 

In this experiment we select only a part of the object (in this case its id, which 
is a numeric value). Here the level 2 (bring the spatial objects) yields better 
performance than level 3 (pointers to objects), since level 3 now needs to send 
a message to each (remote) spatial object to get its ID. What is somewhat 
surprising is that level 1 (bring the whole coverage) yields better results than 
level 3, even of we transmit more data, since we avoid 2 remote executions for 
each object. 

We are aware that different product would yield different results, and different 
inflexion points; we here report the results obtained in our set-up. 



650 



Orlando Karam et al. 



Gov/Level 


1 


2 


3 


4 


5 


6 


7 


8 


9 


0 


146548 


151452 


152217 


148945 


147145 


148989 


151495 


152163 


150499 


1 


4414 


7164 


7836 


7970 


8668 


25456 


23615 


5423 


62906 


2 


4095 


3859 


4346 


4126 


3902 


3907 


4176 


3930 


3886 


3 


4031 


3778 


4265 


3774 


3917 


4002 


4074 


4686 


4333 



Fig. 4. Results for a query that selects NO objects coverage (average of 10 runs, 
in milliseconds, rounded to the nearest millisecond) 



Gov/Level 


1 


2 


3 


4 


5 


6 


7 


8 


9 


0 


110080 


117560 


116930 


120080 


116170 


120420 


123630 


139530 


124610 


1 


3910 


6030 


6050 


7060 


10010 


22800 


21830 


6540 


55040 


2 


3660 


4370 


4310 


4990 


5450 


15880 


13980 


5790 


47860 


3 


3640 


55020 


44500 


70780 


19180 


491290 


698590 


39050 


797620 



Fig. 5. Results with query that gets the id of all the objects in each coverage. 



5 Conclusions 

We have shown how we can leverage the smart pointer technique and apply it to 
distributed applications, and more specifically distributed spatial databases. We 
have also given some light upon what are the trade-offs in doing so, how we can 
quantify them, and shown what the precise numbers would be for our set-up. 

Acknowledgements 

The authors would like to thank the US Marine Corps Warfighting Lab (MCWL) 
under Program Element 63640M for funding this effort. 



References 

1. Maria Cobb, et al: Object-Oriented Database design and implementation issues for 
Object Vector Product format. Tech Rep NRL/FR/7441-95-9641, Naval Research 
Laboratory, (1996) 

2. Gemstone Systems Inc: GemEnterprise User’s Guide, (1997) 

3. M. Tamer Ozsu, et al: Distributed Object Management, Morgan-Kauffmann, 
(1994) 

4. Nabil R Adam, Aryya Gangopadhay: Database issues in Geographic Information 
Systems, Kluwer Academic Publishers, (1997) 

5. M Tamer Ozsu, Patrick Valduriez: Principles of Distributed Database Systems, 
Prentice Hall, (1991) 



Deploying the Mobile- Agent Technology in 
Warehouse Management 



Mei-Ling L. Liu, Tao Yang, and Serna Alptekin^ and Kiyoshi Kato^ 



^ California Polytechnic State University, San Luis Obispo, CA, 93407, USA 

mliu@csc.calpoly.edu 
^ Nihon Fukushi University, Japan 
kato@handy.n-fukushi.ac.jp 



Abstract. Mobile agents, a relatively new paradigm for distributed soft- 
ware development, has become an accessible technology in recent years. 
The potential benefits of this paradigm, including the reduction of net- 
work bandwidth consumption and latency, promise to revolutionize dis- 
tributed applications. 

This paper describes a proposed project to deploy the mobile agent tech- 
nology in warehouse management. 



1 Introduction 

Mobile agents^ a relatively new paradigm for network software development, 
has become an accessible technology in recent years. The potential benefits of 
this technology, including the reduction of network bandwidth consumption and 
latency, promises to revolutionize distributed applications [9,10]. A number of 
research institutions and industrial entities have engaged in the development of 
elaborate supporting systems for this technology [11,10], and the deployment of 
its application can be expected to proliferate in the near future. 

One of the potential applications is in the area of warehouse management 
systems (WMS), where the strength of mobile agents as an autonomous data 
gathering and state information monitoring tool can be exploited. 

In this paper, we describe a prototype project to deploy mobile agents in 
support of Just-In-Time (JIT) inventory [3]. 



1.1 Warehouse Management Systems 

Modern warehouses take full advantage of the computer technology. The in- 
ventory is managed by a sophisticated computer- assisted system [1,5]. A fully 
integrated warehouse management system oversees the entire process of receiv- 
ing and shipping, and maintains up-to-date records of the inventory. Such a sys- 
tem consists of networked computer systems, bar-code scanners, radio- frequency 
(RF) data-collection systems, hand-held computers, along with traditional ware- 
house equipment such as forklifts, pickers, conveyors, pellets, and racks. 

An integrated WMS provides the following key functionalities [2]: 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 651-659, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




652 Mei-Ling L. Liu et al. 



— Receiving: Each arriving pallet or case receives a bar code label which identi- 
fies both the stock- keeping unit (SKU) and case-pack quantities. This infor- 
mation is scanned using portable scanners operated by workers or stationary 
readers mounted along the conveyor lines. The scanned data is transmitted 
to a host computer through a radio-frequency wireless communications link. 

— Storage /put away: If the inventory is to be stored, the WMS designates a 
storage location for it, and, when the unit is delivered to its storage, the 
system verifies that the storage location is as designated. 

— Picking: The WMS receives orders and schedule picking activities. Workers 
on lift trucks equipped with RFDC terminals are directed to storage loca- 
tions. Picked inventory are scanned so that the WMS can verify the pick 
accuracy and update the inventory database. 

— Shipping: The WMS assigns docking areas to shipping inventory. Once the 
orders arrive at the shipping station, the WMS can generate packing and 
shipping labels, possibly working in conjunction with weighing equiment 
and shipping manifest systems. 

— Cross-docking: See next section. 



1.2 Cross-Docking 

Cross-docking, or flow-through receiving and shipping, is the implementation of 
the concept oi just-in-time (JIT) inventory or Kanban (in Japanese). Under this 
concept, the storage and handling of inventory in the supply chain is minimized 
through efficient scheduling enabled by modern computer technology. Ideally, 
inventory arrives just in time for shipment, so that the arriving units can be 
directly dispatched for shipment. 

In effect, JIT inventory trades material storage for information flow, and its 
implementation requires a sophisticated information system capable of main- 
taining precise and up-to-date records of inventory and orders. 

Companies in the forefront of technology, such as General Motors and Toyota 
[4,3], have practiced cross-docking with success, and the JIT inventory concept 
can be expected to gain in importance at any operation which involves receiving 
and shipping of products. 

1.3 Mobile Agents 

Agents [13] are software entities which perform specific functions autonomously 
according to a mission. Conventional programs requires coordination to perform 
data processing: information is input (delivered) to a program, which processes 
the data and generates output to be consumed by the user in some way. By con- 
trast, once dispatched, a software agent functions on its own and asynchronously 
- requiring no synchronized interaction with its initiator - in pursuit of its goals. 
Agent-based software is a powerful software paradigm that is currently under 
intense research, and is considered an emergent Information Technology (IT) 
tool which may potentially transform certain sectors in commerce, manufactur- 
ing, and industry [6]. Software agents have been applied “to filter information. 




Deploying the Mobile- Agent Technology in Warehouse Management 653 



match people with similar interests, and automate repetitive behavior” [7], and 
work has been underway to deploy “agents that buy and sell” [7] to conduct 
transactions in e-commerce. Agents exist for assisting online auctioning (Auc- 
tionBot, developed at the University of Michigan [14]) and for assisting market 
negotiation (Kasbah[15j, Tabican[12|). 

Figure 1 illustrates the use of software agents in auctioning. Agents acting on 
behalf of individual bidders interact with an auction server. Each agent carries 
the information input to it by the bidder, and, once interacts autonomously with 
the auction server, a software component on the host computer of the auctioneer. 
If so programmed, an agent is capable of negotiating transactions autonomously 
(requiring no interaction with the human bidder). At the end of the auction, the 
agent reports the outcome to the bidder. 




Fig. 1. An agent-assisted auctioning system 



A mobile agent [9] is a special kind of software agents which, in addition 
to possessing the basic characteristics of all software agents, has the additional 
ability of transporting itself from computer to computer in pursuit of its goals. 
The mobility renders these agents a promising software paradigm for applications 
running on networked computer systems. The potential of the technology can 
be illustrated with a prototype application described in [16] (see Figure 2). In 





654 Mei-Ling L. Liu et al. 



this application, a mobile agent is deployed to schedule meetings. The agent 
visits the workstation of the meeting participants successively, interacting with 
software on each system to investigate available time slots for the meeting. At 
the end of its journey, the agent returns to its initiator to report its findings. 



workstation of meeting particpant 




Fig. 2. A Prototype Mobile-agent Application 



Lange and Oshima cite “seven good reasons for Mobile Agents” [9] , summa- 
rized as follows: 

1. They reduce the network load. Conventional network software requires net- 
work communication between computers for the necessary interaction among 
the participants. A mobile agent, by contrast, executes locally on each system 
that it visits, thereby reducing the latency introduced by network commu- 
nication. 

2. They reduce network latency. By reducing network communication, the delay 
associated with data transmission is also minimized. 






Deploying the Mobile- Agent Technology in Warehouse Management 655 



3. They encapsulate protocols. Conventional network software employs a proto- 
col agreed upon by all participants, and requires the precise implementation 
of the protocol by all parties. A mobile agent, on the other hand, can be 
programmed to carry the protocol with it and interact with each system it 
visits based on the protocol that it carries. This feature has the potential of 
allowing the protocol to be updated without impacting existing servers. 

4. They execute asynchronously and autonomously, a characteristics common 
to software agents in general. 

5. They adapt dynamically. A mobile agent is capable of sensing changes in 
their execution environment and react autonomously. 

6. They are naturally heterogeneous: Mobile agents are potentially platform- 
independent. 

7. They are robust and fault-tolerant. If implemented properly, mobile agents 
are capable of reacting dynamically to unfavorable conditions, including sys- 
tem and network failures. 

2 Mobile Agents in Warehouse Management 

For the reasons cited above, we believe that the deployment of mobile agents can 
produce significant benefits in warehouse management systems. Mobile agents 
are especially powerful as a tool for information gathering, an aspect vital to 
warehouse management systems. In particular, mobile agents can serve as a 
smart agent for gathering time-critical information. One functionality in ware- 
house management where this is especially beneficial is cross-docking. 

We propose a prototype deployment of the technology to coordinate the ac- 
tivities in the supply chain for cross-docking. In this system, a mobile agent 
travels among the key entities in the chain to gather information to facilitate 
cross-docking. The key entities include the suppliers, the warehouse, as well as 
the transportation vehicles engaged in delivering inventory. A pictorial descrip- 
tion of the system is presented in Figure 3. 



2.1 The Proposed System 

In our vision, a mobile agent travels among a supplier (we shall call it Acme Wid- 
gets)^ A transportation vehicle (say truck 1234)^ dispatched by the supplier, and 
the warehouse {Global Warehouse) to oversee the delivery of inventory. While 
visiting the host computer at Global Warehouse^ the agent gathers information 
about inventory orders that pertains to Acme Widgets. The agent carries the 
orders to the host computer at Acme Widgets^ and interacts with the computer 
to place the order and schedule the delivery. During the delivery of the inven- 
tory, the agent travels among the computers at the supplier, the warehouse, 
as well as an on-board computer on truck 1234- It continuously monitors the 
global position of the truck and reports its progress to both the supplier and the 
warehouse. At Global Warehouse^ the agent interacts with the WMS to report 
the progress of the delivery and to provide information to allow the WMS to 




656 Mei-Ling L. Liu et al. 




Fig. 3. Mobile- Agents- Assisted Cross-Docking 



adjust the just-in-time receiving schedule accordingly. The agent can also lend 
assistance upon the arrival of the inventory by informing the trucker in advance 
of the docking location and reporting to the WMS of the final arrival. 

The benefits of the proposed system are as follows: 

— Timely information is made available by the mobile agent. A mobile agent 
is dedicated to track each order and provides undivided attention to the 
progress of the order, including enroute information. This information al- 
lows the JIT inventory concept to be carried out in a continuous manner, 
providing the opportunities for dynamic adjustments of the shipping and re- 
ceiving schedules to minimize handling and storage outlay at the warehouse. 

— The mobile agent serves as a facilitator by bringing orders to the attention 
of supplier, by providing up-to-date status of inventory delivery, and by 
providing information such as dock assignment to the shipper in advance. 

— The mobile agent can interact with the WMS to update inventory au- 
tonomously. As a further improvement, the barcodes on pallets or cases can 
be scanned by the shipper, performed while the inventory is being loaded 
on the delivery vehicle. The information scanned can be transmitted to the 
onboard computer, to be retrieved by the mobile agent upon its visit to the 
onboard computer. The mobile agent can then relay the information to the 
WMS at the warehouse, allowing the WMS to update its inventory database 
according to the progress of the shipment. 



2.2 The Proposed Implementation 

2.3 Programming language 

One of the reasons that the mobile agent technology has become accessible is the 
advent of the programming language Java. Among its many powerful features. 







Deploying the Mobile- Agent Technology in Warehouse Management 657 



Java provides the infrastructure for supporting mobile agents very efficiently 
[10]. Most important, the capability of serializing (flattening) an objects allows 
an agent, in the form of an object, to be transported over the network from 
machine to machine. There are elaborate wrapper systems - systems built on the 
Java infrastructure - such as Concordia [8] and Aglet [9], which supports mobile 
agents in an open system, providing features such as persistence and security 
needed in an environment where threats of failures and malice must be guarded 
against. For our prototype system, however, we propose to implement using Java 
without any wrapper system. The justification is that we are assuming a closed 
system where the participating computers can be expected to be trustworthy 
and where network and system failures can be expected to be rare occurrences. 
Using plain Java allows the implementation to be lightweight (that is, small in 
code space and fast in execution), and relatively simple. 



2.4 Network support 

Mobile agents transport themselves on networks, and hence network support is 
essential. A fully connected network is required among the computers participat- 
ing in our system to allow the movements of the mobile agent. We propose the 
use of wireless network to supplement existing wired network links. In particu- 
lar, wireless links are required to allow a transportation vehicle to participate in 
the system, because of the mobile nature of the vehicle. On the other hand, a 
wired link, especially if it already exists, can be used for the transportation of 
the agent between the warehouse and the supplier. 

2.5 System support 

We assume the existence of information systems at both the supplier and the 
warehouse, and that these systems may be modified to interact with the mo- 
bile agent either directly or indirectly. Direct interaction is preferred. Indirect 
interaction can be accomplished by, for instance, having the agent deposit infor- 
mation in the host computer system where it can be retrieved by the software 
of the information system. 

Each participating entity (the supplier, the warehouse, and the shipping ve- 
hicle) must have a computer capable of running an agent server which hosts the 
agent during its visit. Since we have chosen to implement using the Java tech- 
nology, these computers must be Java-enabled, that is, capable of running Java 
programs. For the shipping vehicle, a small, onboard computer equipped with a 
Global Positioning System (GPS) receiver is envisioned. 

Figure 4 illustrates the proposed architecture of the implementation. 

3 Future Extension 

We have implemented a prototype of our proposed system, where a single mobile 
agent oversees a simulated shipment on a simulated supply chain consisting of 




658 Mei-Ling L. Liu et al. 



GPS reciever 




JVM = Java Virtual Machine 
wireless network link 



network link, wireless or wired 

Fig. 4. The Proposed System Architecture 



three distributed computer systems, each taking on the role of the supplier, the 
shipment, and the warehouse respectively. 

As the next step of our project, we envision deploying the software on a 
system as described in section 2.1. 

The scope of the application of the technology can be further broadened: 

— The scope of responsibilities of the agent may be expanded to oversee all 
orders pertaining to one supplier. Thus, there will be one agent per supplier, 
each responsible for coordinating the activities between the supplier and the 
warehouse. 

— The range of the agent may be expanded to oversee all orders for all suppliers. 
In this scenario, the agent will visit the host computer at each supplier, at 
each shipping vehicle, and at the warehouse. In addition to coordinating 
shipments between the suppliers and the warehouse, such an agent may also 
coordinate activities among the suppliers. 

— For a large supply distribution system where multiple warehouses or dis- 
tribution points are employed in the supply chain, each distribution point 
may be served by a separate agent. The agents may communicate among 
themselves to coordinate the inventory management among the distribution 
points. 

— The mobile agents may be intelligent agents capable of decision-making. 
For example, agents may be programmed with the intelligence to support 
adaptive scheduling and dynamic routing of shipments. 






Deploying the Mobile- Agent Technology in Warehouse Management 659 



4 Conclusion 

In this paper, we proposed a prototype application in warehouse management 
where the mobile-agent technology is deployed to facilitate cross-docking. We 
assert that the proposed system will provide timely information to all parties 
involved, allowing the inventory handling and storage outlay at the warehouse 
to be minimized. Additionally, the mobile agent provides autonomous assistance 
in the coordination of activities along the supply chain. 

Future extensions for the system were also proposed. 

We believe that the mobile agent technology has the potential of contributing 
significantly to the management and control of supply chains. 



References 

1. Cooke, J.: Cross-docking software: Ready or not? Logistics. Oct. 1997 651 

2. Randall, S.: The Value of WMS. Modern Materials Handling. May 1998 651 

3. Minahan, T.: CM looks outside to fuel internal JIT initiatives. Purchasing. Sept. 
1996 651, 652 

4. Bar codes, wireless terminals cut waste from Toyota’s supply chain. Automated 
Data Capture applications in Manufacturing, warehousing, and distribution. Dec, 
1998 652 

5. Make your warehouse sing. Logistics. April, 1996 651 

6. Joshi, A., Singh, M,: Multiagent Systems on the Net. Communications of the ACM. 
42 1999 39 - 40 652 

7. Maes, P., Guttman, R., Moukas, A: Agents That Buy and Sell. Communications of 
the ACM. 42 1999 81-91 653, 653 

8. Koblick, R.: Concordia. Communications of the ACM. 42 1999 96-97 657 

9. Lange, D., Oshima, M.: Seven Good Reasons for Mobile Agents. Communications 
of the ACM. 42 1999 88-89 651, 653, 654, 657 

10. Wong, D., Paciorek, N., Moore, D.: Java-based Mobile Agents. Communications 
of the ACM. 42 (1999) 92-102 651, 651, 657 

11. Milojicic, D., Doughs, F., Wheeler, R.: Mobility. Addison Wesley, Reading, Mas- 
sachusetts. 1999 651 

12. Lange, D., Oshima, M.: Programming and Deploying Java Mobile Agents with 
Aglets. Addison Wesley, Reading, Massachusetts. 1988 653 

13. Berney, B.: Software Agents - A Review. 

http:/ /www. doc. mmu.ac.uk/STAFF /B.Berney/research/ag-rev.htm 652 

14. Wurman, P., Wellman, M., Walsh, W.: The Michigan Internet AuctionBot: A con- 
figurable auction server for human and software agents. Second Internation Confer- 
ence on Autonomous Agents. 1998 653 

15. Chavez, A., Maes, P.: Kasbah: An Agent Marketplace for Buying and Selling 
Goods. Proceedings of the First International Conference on the Practical Appli- 
cation of Intelligent Agents and Multi- Agent Technology (PAAM ’96). April 1996 
653 

16. Liu, M. L., Liu, Y.: A Prototype Mobile- Agent Application. Proceedings Confer- 
ence on Computers and their Applications (CATA), March 2000 (to appear). 653 




A Lightweight Capability Communication 

Mechanism 



David S. Robertson^, Jaume Agusti^, Flavio S. Correa da Silva^, 
Wamberto W. Vasconcelos"^, and Ana Cristina V. de Melo^ 

^ Division of Informatics, University of Edinburgh, Edinburgh, Scotland 
^ Institut d’ Invest igacio en Intel.ligencia Artificial, Bellaterra, Catalunya 
^ Institute de Matematica e Estatistica, Universidade de Sao Paulo, Sao Paulo, Brazil 
^ Dept. Estatistica e Computacao, Universidade Estadual do Ceara, Ceara, Brazil 



Abstract. A persistent problem in managing the interaction between 
distributed agents is to be able to coordinate the communication between 
systems without having continually to ask each system for information 
about what it can do. One form of coordination is through the use of ca- 
pability descriptions that are advertised by each agent and managed by a 
brokering mechanism. The task of the broker (which may be centralised 
or distributed among the agents) is to accept queries and to hypothesise 
the means of obtaining answers based only on the capability descrip- 
tions. This has the advantage that plans for coordinating answers can be 
constructed by the broker without having to contact the agents. Broker- 
ing, however, is not straightforward because capability descriptions can 
be complex and may be conditional on interactions with other agents. 
Brokering must also take into account the possibility that the ontologies 
used by each agent may differ, so some means of relating the terminology 
of capabilities of agents is needed. Many sophisticated systems exist for 
tackling parts of this problem but there have been comparatively few at- 
tempts to build lightweight engineering solutions by adapting well estab- 
lished methods. We describe a simple way of implementing a lightweight 
but powerful brokering mechanism. 



1 Introduction 

Knowledge sharing on a large scale, between significant numbers of agents, is dif- 
ficult for at least three reasons. One is that it is difficult to connect the ontology 
of a given system with those of others. A second is that it is difficult to compare 
the inference methods used in each system [2,3,4]. A third is that in systems 
with automated interaction between agents it is difficult to know which systems 
should be asked for the information we require. A partial solution to this third 
problem is through the use of capability descriptions of the main facilities each 
agent can deliver, thus allowing an assessment to be made about which agents 
to consult without having to wake up each agent to do so. An architecture for 
delivering this kind of ability often involves a brokering system, which has the 
job of assessing capabilities and suggesting how queries to different agents may 
be combined to discharge complex capabilities. This is a complex problem at the 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 660-672, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



A Lightweight Capability Communication Mechanism 661 



heart of agent communication and, consequently, it is possible to design heavy- 
weight sophisticated systems for dealing with it We believe, however, that 
there is a niche for simpler methods which give reasonably sophisticated forms 
of capability description using well established techniques which, because of their 
familiarity, are easier for engineers to pick up. This paper describes a method 
of describing capabilities and brokering them which is formal (our method uses 
Horn clauses with the normal semantics of this subset of predicate calculus) but 
is also lightweight in the sense that we use straightforward engineering methods 
in a tightly focussed way. 

Many recent research projects have taken into account the shift from process- 
centric to communication-centric computation and explored the various ways in 
which stand-alone computing devices and software can enlarge their capabilities 
this way^. A capabilities brokering mechanism has been focus of our interest 
partially due to the research project DECaFf-KB^ 

We present our ideas through an illustrative example. In section 2 we sum- 
marise our proposed lightweight capabilities brokering architecture; in section 3 
we introduce our example, based on interactions between ecological models for 
environmental monitoring and problem-solving; in sections 4 to 7 we go through 
the use of our proposed architecture. Finally, in section 8 we draw some conclu- 
sions and address future work. 

2 The Basic Brokering Method 

The basic method used in our capability brokering system is depicted in Fig- 
ure 1. Initially, it is necessary for each participating agent to have described its 
capabilities in our capability language. In the diagram we depict these capabili- 
ties as the boxed Cl and C2 attached to each of the two agents in the example. 
Agents advertise their capabilities simply by sending these to the broker, which 
records the capabilities and the agents who claim to be able to supply them. In 
the next stage, another agent (in our diagram, agent 3) sends a query to the bro- 
ker. The broker constructs from its capability descriptions (now detached from 
the agents) its internal description, which we call a “brokerage structure” of 

see for example the surveys in [8,12] or some recent results reported in [11] for databases 

2 

The first software applications based on Internet/ WWW resources were database interfaces through the Common 
Gateway Interface, Microsoft Active Server Pages and Java or Java-alike applets and servlets. Although these 
applications extended information extraction (and to some extent information processing) from single machines 
to networks of interconnected machines, the conceptual framework for how computing processes were designed 
and executed was still closer to the traditional framework than to the network-centric framework; the maturity 
of object-based software development and of distributed computing led to distributed object-based technology, 
in which computation is actually performed across a computer network. Many standards for implementing dis- 
tributed object-based technology have been proposed, for example OMG CORBA and Microsoft DGOM, that 
provide software developers with their respective languages to specify ways of establishing communication and 
message-passing among software objects; Sun’s JINI architecture proposes the extension of the network-centric 
computing framework to a host of devices with varied computational capabilities. Brokering is replaced by discov- 
ery/lookup processes, every piece of code is mobile through the network and computation is fully decentralised; 
many proposals have been presented to add communicating features to knowledge-based systems. The majority 
of them assume that the systems willing to share information and knowledge with each other all abide by a 
lingua franca, whose expressive power encompasses those of the communicating systems — at least with respect 
to the pieces of information and knowledge that these systems will share with each other. This is the approach 
adopted for example in the design of the Knowledge Interchange Format, the Knowledge Query and Manipulation 
Language and Arcol/Foundation of Intelligent Physical Agents. 

3 

Distributed Environment for Gooperation Among Formalisms for Knowledge-based Systems — sponsored by the 
Gonsortium British Council/G APES (Brazil). The main goal of that project is to provide computational means for 
heterogeneous knowledge based systems to be put to exchange messages and share their knowledge and capabilities 
to answer queries. The capabilities taken into account in that project are, therefore, descriptive of expressive 
power and inference mechanisms of the underlying logical systems of each system, as well as the knowledge they 
contain and are prepared to offer to different systems. 



662 



David S. Robertson et al. 



how the query might be answered based on those capabilities. It then translates 
its brokerage structure into a sequence of performative statements describing 
the messages which it thinks should enable to the query to be satisfied by re- 
questing appropriate agents to discharge their capabilities. In the final stage this 
performative information is used by the agent which sent the query to select 
which agents to contact; to send appropriate messages to them; and to await 
appropriate responses. 




Fig. 1: A capability brokering method 



The diagram of Figure 1 is only one of the ways in which the method of this 
paper may be deployed. In particular, the method does not prescribe that the 
agents submitting queries must be different from the agents supplying capabili- 
ties. It also does not prescribe that there must physically be a single, centralised 
broker to which all the agents advertise and which all queries must be addressed 
(as in the first stage of the diagram of Figure 1). It would be possible to use our 
method in a decentralised brokering, provided that each agent had a copy of the 
brokering mechanism and there was a way of broadcasting capabilities to groups 
of agents. This substitute to the first diagram of Figure 1 is given in Figure 2^. 

To develop a working instance of this sort of architecture it is necessary to 
define the following components: 

A capability language: this can be understood as a partial specification of the 
associated agent. It describes key information which can be obtained from the agent 
without stipulating how it is derived; a correspondence language: since the way 
problems are described and solved may differ between agents there needs to be a way 
for brokers to spot correspondences between the capabilities of systems. Notice that 
this is between capabilities only, not between the actual implementations of the agents, 
which we expect to be much more complex than their capability descriptions. This is an 
area where deep research is possible but we present here a simple but effective solution; 

Decentralised brokers will require additional care in e.g. maintaining consistency among decentralised capabilities 
descriptions. We are aware of this needed care at implementation stages of decentralised brokers and we intend to 
develop this issue in future articles, e.g. via a mobile verifier that will run cyclically through existing capabilities 
descriptions collecting and updating information among them. Our idea of mobile verifier has been inspired mostly 
by [9] 



4 



A Lightweight Capability Communication Mechanism 



663 




a capability brokering mechanism: brokering requires the construction of one or 
more data structures, each of which which describes a way of supplying the information 
required by a client based on the capabilities advertised by the servers. This must be 
done automatically because we assume that the client has no innate knowledge of the 
capabilities of servers or the correspondences between their ontologies; a translation 
to performatives: here we use the word “performative” in deference to the KQML [5] 
language for message passing between agents. The messages we send can be viewed as 
simplihed forms of KQML performative, although this is not central to our paper. 
The purpose of performatives in this system is to obtain the information promised 
from the servers through their capabilities and coordinated in the brokerage structure. 
The additional conversion is needed, rather than using the brokerage structure directly, 
because decisions must be made about the ordering of performative messages and these 
decisions are not uniquely determined by the brokerage structure. 

In Sections 4 to 7 we work through each of these components in detail, using 
a running example for illustration. First, we introduce the example. 

3 Running Example 

To make our example concrete we describe two agents in detail. These are taken 
from our earlier work on formal representations of ecological models (see [10]). 
They are not realistically sized agents but have the advantage of being sophisti- 
cated for their size and being based on widely differing approaches to modelling, 
despite having a certain degree of complementarity. They are therefore good 
examples for a short paper like this one. An overview of the two models, cast 
into the brokering framework of Figure 1, is shown in Figure 3. On the left is 
a model described in System Dynamics notation, describing the dynamics of a 
herbivore- vegetation system. The boxes are state variables giving the size of the 
vegetation (v) and herbivore (h) populations. The arrows represent flows be- 
tween state variables, which originate and end in an external source-sink (ss). 
These flows are described by equations which are regulated by the parameters 
(kl, k2 and k3). On the right is a model of the cyclic interaction between a 
predator and prey populations, which we have shown as two synchronised oscil- 
lations in their size over time. The most striking feature of these oscillations is 
that the qualitative sizes of these populations are in opposition - so when the 



664 David S. Robertson et al. 



numbers of prey are low the numbers of predators are high and vice-versa. This 
unusual effect can be caused by time lags in the response of each population to 
the numbers in the other. In our example, the predator-prey model will describe 
only this opposition between high and low values, not the quantitative changes 
as the populations oscillate. 




Fig. 3: Example of brokering between two modelling systems 



Although the point of our paper is to avoid considering the details of agent 
implementation when brokering capabilities, it is helpful to consider simple im- 
plementations as a way of explaining where distinctions lie. We use the Prolog 
programming language to describe these implementations because it fits conve- 
niently to the language which we use later to describe capabilities. This is not, 
however, a fundamental restriction of our method. To emphasise that we are 
making no assumptions about what each agent contains we have made our ex- 
amples a little non-standard. Figure 4 describes a model of an ecological system 
in a System Dynamics notation. Figure 5 gives a simple predator-prey model. 
Although not conventional agent applications, and simplified for the purpose 
of demonstration, others have conducted knowledge sharing experiments using 
similar examples - see e.g. [1]. 

To save space, we have not given all of the definitions needed for the systems 
of Figures 4 and 5. We have, however, enough detail to define some capabilities 
in the next section. 

4 Describing Capabilities 

Having defined the two agents of the previous section, we now wish to hide the 
detail of their knowledge representation language (Prolog) and inference mecha- 
nisms (the Prolog interpreter) and we also want to be selective about our claims 
to other systems about what each agent is able to do. For this we select a conve- 
nient language for describing capabilities. This must be constructed by engineers 
from a variety of backgrounds so its semantics should be as straightforward as 
possible. It must also be formal because it will be used automatically by the bro- 
kering system. We have chosen Horn clauses as our capability language because 
this provides us with convenient and expressive formal language which, as we 



A Lightweight Capability Communication Mechanism 665 



We have two state variables (v representing vegetation biomass and h representing 
herbivore biomass) and three parameters (kl, k2 and k3), plus the value for each 
parameter and the initial value for each state variable. 

stvar(v). init_value(v, 1000). stvar(h). init_value (h, 50). 

param(k3) . param_value(k3, 0.02). param(kl). param_value (kl , 0.01). param(k2) . param_value (k2 , 0.005). 

We have flows from the source-sink (ss) to v; from v to h; and from h to ss. These are 
described as difference equations regulated by the parameters. 

flowCss, V, Vf, [(kl,K)], Vf is K * 100). flow(v, h, Hi, [(v,Vp) , (k2 ,K) , (h,Hp)] , Hi is K * (Vp/Hp)). 

flow(h, ss, Ho, [(h,Hp) , (k3,K)] , Ho is K * Hp) . 

We can find a value for a parameter simply by looking up its value. For state variables, 
we can do the same at the initial time point but for any later time point we must 
calculate the value by finding the sum of input and output flows for it and adjusting 
its previous value by these amounts. 

value(X, T, V) param(X) , param_value (X, V). 

value(X, T, V) stvar(X), init_tiiiie(T) , init_value(X,V) . 

value(X, T, V) stvar(X) , not (init_time (T) ) , previous(T, Tp) , setof (f (I ,X, V, A ,B) , f low(I ,X, V, A,B) , In), setof (f (X, 0, V, A,B) , 
flow(X,0,V,A,B) , Out), sum_flows(In, T, SI), sum_f lows (Out , T, SO), value(X, Tp, Vp) , V is Vp + SI - SO. 

We can also find out the influence of one variable on another by following through the 
network of flow connections. 

infIuence(X, Y) fIow(X, Y, _, _, _) . infIuence(X, Y) fIow(X, Z, _, _, _) , influence(Z, Y) . 

Fig. 4: A System Dynamics model 



Predators and prey are antagonists of each other. Lynx are predators and Hare are 
prey. 

antagonist(X, Y) (predator (X) , prey(Y)) ; (predator (Y) , prey(X)). predator(Iynx) . prey(hare) . 

The qualitative magnitude, Vx, of a population, X, will be the opposite of the qualitative 
magnitude, Vy, of its antagonist population, Y. The only qualitative magnitudes in this 
model are high and low, which oppose each other. 



size(X, Vx) antagonist (X, Y) , size(Y, Vy) , opposite(Vx, Vy) . opposite (low, high). opposite (high, low). 

Fig. 5: An elementary predator-prey model 



shall show, can be used to provide a simple brokering mechanism. We make no 
claim that this is the best such language. 

The intuition behind our use of Horn clauses for capabilities is that we 
have four forms of capability, C, each of which is defined within the expres- 
sion cap(iF, (7), denoting that the agent named K can deliver capability, C in at 
least one instance or, if not, will signal failure. Valid options for C are: 

- A unit goal of the form P(Ai, . . . , An), where P is a predicate name and Ai,. . .,An 
are its arguments. 

- A conjunctive goal of the form (Ci A ... A Cm) where each Ci is a unit goal or 
a set expression. 

- A set expression of the form setof {X,C, S), where C is either a unit goal or a 
conjunctive goal; X is a tuple of variables appearing in C; and 5 is a set of instances 
of those tuples which satisfy C. 



666 David S. Robertson et al. 



- A conditional goal of the form Cc ^ Cp, where Cc is a unit goal which the 
agent, K, will attempt to satisfy (but will not guarantee to satisfy) if the condition, (7p, 
is satisfied. Cp is either a unit goal or a conjunctive goal. 

We can now use this style of capability description for our examples in Fig- 
ures 4 and 5 (above). The choice of which capabilities to describe requires in- 
genuity on the part of the engineer of each agent. There is, however, a simple 
procedure for deciding initially what will be described. We describe this using 
the System Dynamics agent (named sd below): 

- First choose the predicate names that correspond to information that might use- 
fully be transmitted by the system. In sd we could imagine transmitting information 
about the values and influences in the system so the predicate names are value and 
influence^ assuming for convenience a direct translation between the predicate names 
internal to the agent and those we use in capability descriptions. 

- Then decide whether we can provide this information directly or if we need to 
impose conditions on provision. If we impose conditions then we should start again 
from the item above for those predicates which appear in the conditions. Information 
about influence can be given directly but for value we need to check that we are 
asking for an appropriate time and state variable. Therefore we add time and stvar to 
those which may be transmitted by the system. 

- For each of these predicates decide how the information about it is to be trans- 
mitted. This can be in one or more of the four forms of capability described earlier. 
The information on value ^ we already know, is conditional on time and stvar. The 
information on time can be provided unconditionally but only for individual instances, 
since the set of possible times is not bounded. The information for stvar can be made 
available either as instances or as the full set because there are only two of these. 
Continuing this process for all the predicates, and identifying variables appropriately, 
yields the set of capabilities shown below. 

cap(sd, (value(X , T, V) < — time(T) A stvar(X))) cap(sd, time(T')) 

cap(sd, lstvar{Y), setofiX, (stvar{X) A in fluence{X , Y)) , S))) cap{sd, seto f {X , stvar {X) , S)) 

cap(sd, (stvar(X) , setof(Y, (stvar{Y) A in f luence(X , Y)), S))) cap(sd, in fluence{X , Y)) 

cap(sd, setofiiX, Y), (stvar(X) A stvar(Y) A in fluence{X , Y)) , S)) cap{sd, stvar{X)) 

cap(sd, (value(X , T, V) < — time{T) , influence{X, Y))) cap(pp, prey(Y)) 

cap(sd, (value(Y, T, V) < — time(T) , influence(X, Y))) cap{pp, predator (X)) 



Notice that for the predator-prey system of Figure 5 we have only two ca- 
pabilities (labelled pp above) which allow information about the names of the 
predator and prey to be advertised. We were not able to advertise information 
about the size of the populations in the predator-prey system because it, alone, 
is not capable of determining this information. There is only one clause in the 
Prolog program of Figure 5 for determining size and this is conditional on infor- 
mation being available about the size of the antagonist population - so to know 
the size of the predator we need to know the size of the prey and vice-versa, 
but neither is asserted in the predator-prey system. There needs to be a way of 
advertising a capability which is conditional on the agent in question receiving 
additional information from another agent, mediated by the broker. For this, we 
have a partial capability description, pjcap{K^C^E), which is identical to our 
original capability description but with an additional argument, E, containing 
the capability required from another agent. Using this representation for size in 
our predator-prey system we obtain: 



A Lightweight Capability Communication Mechanism 667 



p^cap(pp, (size(X,Vl) < — predator(X) A prey(Y)), size(Y,Y2)) 
p-cap{pp , (size(Y, V2) < — predator(X) A prey(Y)) , size(X, d)) 



We now turn to the next representational task before brokering: that of 
defining correspondences between agents. 

5 Describing Correspondences 

Partial capabilities like the ones in the previous section cannot be discharged 
unless we either are lucky enough to have identical capabilities being described 
in different agents or we have taken the trouble to describe correspondences be- 
tween capabilities. There are numerous ways of expressing such correspondences 
and no consensus on the best approach. We could, for example, try to build an 
interlingua [ 7 ] into which capabilities from different agents were translated and 
out of which we could translate the information needed to satisfy partial capabili- 
ties. This, however, makes strong assumptions about the integrity of information 
transmitted through a shared language. For simplicity, we have instead used def- 
initions which give direct correspondences between pairs of agent capabilities. 
Each correspondence is of the form corr(LCi, Ci, ^^2, C^2, G) ^ P where Gi is a 
capability in agent Ki which corresponds to capability C2 in agent K2 with the 
constraint G restricting the acceptable substitutions for variables in Gi and G2. 
The precondition, P, is an optional conjunction of other correspondences upon 
which the main correspondence depends. A set of correspondences between value 
in the sd agent and size in the pp agent appear below. The first of these states 
that values of less than or equal to 10 in sd correspond to low sizes in pp pro- 
vided that the state variable over which the value is defined in sd corresponds to 
the prey over which size is defined in pp. The last of the correspondences below 
states that the state variable, h (the herbivores), in sd corresponds to the prey, 
hare, in pp. 

corr(sd, value{X 1 , T, V), pp, size{X2, low), (V < 10 A C)) < — corr(sd, stvar(X 1) , pp, predator(X2) , C) 
corr{sd, value{Xl, T, V) , pp, size{X2, low), {V < 500 A C)) <— corr(sd, stvar(Xl) , pp, prey{X2) , C) 
corr(sd, value(X 1 , T, V), pp, size(X2, high), (V > 10 A C)) < — corr(sd, stvar(Xl) , pp, predator (X2) , C) 
corr{sd, value{Xl, T, V) , pp, size{X2, high), (V > 500 A C)) ^ corr(sd, stvar(Xl) , pp, prey{X2) , C) 
corr{sd, stvar(h) , pp, prey {hare) , true) 



We now have representations of capabilities and correspondences. Our next 
step is to define a means of generating brokerage structures. 

6 A Brokering Mechanism 

The purpose of a broker is to find, for a given query posed by a client, the ways 
in which agents which have advertised their capabilities might be contacted in 
order to satisfy that query. In our example, the client might want to obtain 
a prediction of the size of the Lynx population and, to obtain this, it will be 
necessary to acquire the services of both of our example agents. We need a 
formal way of representing this sort of combination of capabilities, for which 
we use what we call a brokerage structure, of the form c(A, G), where K is 



668 David S. Robertson et al. 



the name of the agent which should be able to deliver the capability and C is a 
description of the sources of the capability. C can be in any of the following forms: 
a capability available directly from K] a term of the form c(RT, dq(Q, QC)), where Q is a 
capability obtainable from K conditional on its other capabilities and QC describes how 
these capabilities are obtained; a term of the form c{K,pdq{Q, QC, QP)), where Q is a 
capability obtainable from K conditional on its other capabilities and on capabilities 
external to K, and QC and QP describe how these internal and external capabilities 
(respectively) are obtained; a term of the form c{conj, co{CQl,CQ2)), where CQl 
and CQ2 are two capability structures which must jointly be satisfied; a term of the 
form c{K,cn{Q,G,c{Kl,Ql))), where K1 is the name of an agent different from K 
which allows capability structure Q to be delivered in combination with capability 
structure Q1 provided that the correspondence constraints given by G are satisfiable. 

We can now describe a method for constructing brokerage structures of the 
form given above using the capability and correspondence definitions which re- 
side in the brokering system. Notice that this does not involve any additional 
interaction with the individual agent systems - the computation can be done 
entirely within the broker. We describe the algorithm below as a logic program 
because this is compact, precise and declarative but the mechanism itself could 
be implemented in a procedural language. The algorithm proceeds by cases cor- 
responding to each of the forms of brokerage structure given above. The dq 
structure is obtained from a conditional cap definition; the pdq structure from 
a p-cap dehnition; the co structure from two capability structures; and the cn 
structure via correspondence. In all cases where we introduce a new capability 
into our structure we must demonstrate that it too is obtainable from our defini- 
tions - hence the recursive use of broker in the algorithm. For partial capabilities 
{pdq structures) we need the same form of brokering but with the constraint that 
the external capability required by the agent comes from some other source. The 
easy way to describe this is simply to replicate the broker algorithm but with 
an additional argument {Kn in eJbroker below) that records the original agent 
name and prevents it from being used to satisfy the external capability goal. 

broker {Q , c(K, Q)) < — cap(K, Q) 

broker{Q, c(K, dq{Q , QC))) ^ cap(K, (Q ^ C)) A broker{C, QC) 

broker(Q, c{Kl, pdq(Q , QC, QP))) ^ p.cap(Kl, (Q ^ C), P) A broker{C, QC) A e.broker{P, Kl, QP) 
broker {{Ql , Q2), c(conj, co(CQl, CQ2))) < — broker (Q1 , CQl) A broker (Q2 , CQ2) 
broker{Q2, c(K2, cn(Q2, G, c{Kl, BQ)))) ^ corr(Kl, Ql, K2, Q2, G) A broker(Ql, c(Kl, BQ)) 
e-broker {Q , Kn, c(K, Q)) < — cap(K, Q) A not{K = Kn) 

e-broker {Q , Kn, c(K , dq{Q , QC))) < — cap{K, (Q < — C)) A not(K = Kn) A broker (C , QC) 
e-broker{Q, Kn, c{Kl, pdq(Q , QC, QP))) ^ p-cap{Kl, (Q ^ C), P) A not{Kl = Kn) A broker(C, QC)A 
e-broker{P, Kl, QP) 

e-broker ((Ql , Q2), Kn, c(conj, co(CQl, CQ2))) < — e-broker (Ql , Kn, CQl) A e-broker (Q2 , Kn, CQ2) 
e-broker(Q2, Kn, c(Kn, cn(Q2, G, c(Kl, BQ)))) ^ corr(Kl, Ql, Kn, Q2, G) A broker(Ql, c(Kl, BQ)) 



To demonstrate how the algorithm above works we can apply it to our ex- 
ample of brokering a query about the size of the Lynx population. If our broker 
attempts to satisfy broker{size{lynx, S),R) using the capabilities and corre- 
spondences we defined earlier then it can obtain a number of solutions for R. 
Two of these are given in Figure 6. 

The first (expression 1) says that the pp agent can answer provided that 
lynx is a predator and hare is a prey according to pp and that the size of the 
hare population is considered to be low in pp via a correspondence to a value 



A Lightweight Capability Communication Mechanism 



669 



f 




( 




=o( 

(h 



’)) 



c I sd, dq 



size{lynx , S ) , 
c(pp, predator(lynx)) . 
c{pp, prey{hare)) 
size{hare, low), 

{y =< 500, true), 

value{h, T, V), 

ni CO ( time(T)), 

\ c(sd, stvar(h)) 



size{lynx , S) , 
c(pp, predator (lynx)) , 
prey (hare) , 
true, 

^ c(sd, stvar(h)) 
ize(hare, low), 

(V =< 500, true), 

value(h, T, V), 

( c(sd, time(T)) , 

, CO [ ) ’ Vt 

\ c(sd, stvar(h)) 



c{pp, prt 

(pp.c 

( 

(c(c 



Fig. 6: Example brokerage structures 



\\ 

// 

\\ 

// 



( 1 ) 



( 2 ) 



of no more than 500 for the biomass of h in the sd agent. The sd agent should 
be able to establish this if it is capable of finding an appropriate time and of 
recognising as a state variable. 

The second brokerage structure (expression 2) is similar to the first except 
that confirmation that hare is a prey within the pp agent is not done locally 
but via a correspondence to the state variable h in the sd agent. This raises the 
issue of preference between brokerage structures. We do not present a solution 
to this here but there are heuristics for deciding which possible brokerings we 
would prefer. In our example, the first structure seems preferable to the second 
because it makes fewer correspondences between expressions in different agents 
and, since each correspondence raises the possibility of imprecise equivalences 
having been defined, it seems better to take fewer of these risks. 

We are now at the stage where we know what we want to broker but we have 
not committed to how this might be done. This is the topic of the next section. 



7 Assembling a Performative Sequence 

The brokerage structures of the previous section describe which pieces of infor- 
mation are worth querying from each agent and stipulate how they can interact 
in satisfying the query originally posed by a client. They do not, however, pre- 
scribe the sequence in which we should transmit messages to the agents which 
we are coordinating. This is important because, for example, we do not want 
to ask the sd agent for value{h^T^V) until we have asked it for an appropri- 
ate time (by asking it to satisfy time(T)). How we establish an appropriate 
sequence of messages depends on the conventions being used for message pass- 
ing. For the purposes of example, we use a simple convention in which we have 
three communication acts ( “performatives” in the terminology of KQML) that 
are transmitted sequentially: 

- ask{K,C) denoting that we are asking the agent named K to discharge the 
capability C. We must obtain a response to this message with an instance for C before 
proceeding with the rest of the sequence. 

- tell{K,C) denoting that we are informing the agent named K that a capability 
which it required externally can be discharged by a correspondence to another agent. 



670 David S. Robertson et al. 



We must obtain a response from K indicating that it accepts the information before 
proceeding with the rest of the sequence. 

- test{G) denoting that whatever system is sending the messages should attempt 
to satisfy the constraint, before sending any further messages in the sequence. 

We now need an algorithm for translating the brokerage structures of the 
previous section into message sequences which conform to our message passing 
conventions. We describe this below in the style of a Definite Clause Grammar 
(DCG) [6] where an expression of the form Sh ^ S denotes that we are permitted 
the sequence represented by the expression Sh if we are permitted the sequence 
represented by expression S. S can be a sequence (separated by commas) of any 
of the following expressions: a subsequence expression of the form 5(Ai, . . . , 
a call to a predicate of the form P(Ai, . . . , Am)] or a sequence of terminal symbols 
of the form [Xi, . . . , The grammar is used to generate the sequence of 
terminal symbols, corresponding to performatives, by unpacking the brokerage 
structure. We assume in the definitions below that the DCG rules are mutually 
exclusive, so there is only one possible rule for each form of brokerage subterm. It 
is readily implemented in Prolog but could be implemented in other languages. 
Some of the rules require a predicate dependent jqueries{QC ^ DQ) which is used 
to construct the conditions, DQ, for a dependent capability from the brokerage 
structure, QC . This is a straightforward definition because we simply pull the 
appropriate subterms out of QC but we omit it here to save space. 

assemble{c{S , dq(Q , QC))) => {dependent^ueries {QC , DQ)}, assemble(QC) , [ask{S, {Q < — DQ))] 
assemble{c{Sl , pdq(Q , QC, Q P))) => }dependent-queries{Q P, DQ)}, assemble(QC) , assemble{Q P) , 
[ask(Sl, (Q ^ DQ))] 

assemble{c{conj , co{C Q1 , CQ2))) assemble{CQl) , assemble{CQ2) 
assemble{c{S , cn(Q, C, CQ))) => assemble{C Q) , [test{C) , tell{S, Q)] 
assemble{c{S, Q)) => [ask(S,Q)] 

The assembled message sequences corresponding to brokerage structures 1 
and 2 are sequences 3 and 4, respectively, below. Notice how these have flattened 
the declarative brokerage structures into sequences which respect the procedural 
realities of message passing. For example, the request (fifth in sequence 3) to the 
sd agent to find a value for h provided that T is a time and h a state variable 
should come after we have asked whether sd is capable of generating a time, T, 
and of recognising that is a state variable. 



ask{pp, predator {lynx)) , ask{pp, prey {hare)) , ask{sd, time{T)) , 
ask{sd, stvar{h)) , ask{sd, {value{h , T, V) < — time{T) A stvar{h))) , 

test{{V < 500, true)), tell{pp, size{hare, low)), ask{pp, {size{lynx , S) < — size{hare, low))) 



(3) 



ask{pp, predator{lynx)) , ask{sd, stvar{h)) , tell{pp, prey {hare)) , 
ask{sd, time{T)) , ask{sd, {value{h , T, V) < — time{T) A stvar{h))) , 

test{{V < 500, true)), tell{pp, size{hare, low)), ask{pp, {size{lynx , S) < — size{hare, low))) 



(4) 



8 Conclusions 

We have described a lightweight but powerful brokering mechanism. It is 
lightweight because it employs methods which are taught routinely to students 



A Lightweight Capability Communication Mechanism 671 



and engineers and we might reasonably expect it to be picked up by those groups 
of people without excessive additional training. It is powerful because it pro- 
vides an expressive capability language based on predicate logic. Nevertheless, 
the current system is merely a prototype which needs further development and 
experimentation. The most immediate points of concern are described below. 

- Our current implementation of the algorithm for generating brokerage structures 
operates in a Prolog-like style, simply attempting to satisfy goals from capabilities 
using a depth-hrst search. This has all of the problems which are familiar from normal 
logic programming, including the potential for non-terminating search. This problem 
can be addressed by applying well known methods of search control (the simplest of 
these being to limit the size of brokerage structures). 

- In large agent systems the broker may be able to generate huge numbers of 
brokerage structures by combining the capabilities of different agents. In such cases it 
is necessary to have a means of choosing preferred structures. There are some obvious 
heuristics for making this choice, for example by preferring smaller brokerage structures 
or structures which involve the fewest agents. These are, however, only heuristics and 
it is not yet clear how effective they would be. 

- Our method for describing correspondences by direct association between agents 
is primitive compared to the forms of knowledge sharing envisaged by others (see 
Section 5). Because our work is at an early stage when additional problems are a dis- 
traction, we have avoided this issue but it seems straightforward to allow the possibility 
of an interlingua rather than a direct mapping to form correspondences between agent 
ontologies in our mechanism. 

- The language we use for performatives is simplistic compared to systems like 
KQML. We have yet to explore how brokerage structures may translate to more so- 
phisticated performative languages. 

- Our method assumes, as all such systems must, that the connection between 
advertised capabilities and actual capabilities is sufficiently robust. In other words, if 
a broker constructs brokerage structures for queries from the capability descriptions 
of which it has knowledge then the performatives generated from those structures will 
answer a large number of those queries reliably. We say “a large number” rather than 
“all” because it is not possible to guarantee reliability in an open system, where we have 
no idea how carefully each agent was engineered. To improve reliability we need good 
engineering methods to connect the internal operations of agents to their advertised 
capabilities. 

References 

1. P. Borst, H. Akkermans, and J. Top. Engineering ontologies. Inti. Journal of Human- 
Computer Studies^ 46:365-406, 1997. 664 

2. F. S. Correa da Silva, W. W. Vasconcelos, and D. Robertson. Cooperation Be- 
tween Knowledge Based Systems. 4^^ World Congress on Expert Systems, 1998, 
Mexico:819-825. 660 

3. F. S. Correa da Silva, W. W. Vasconcelos, J. Agusti, D. Robertson, and A. C. 
V. Melo. Why Ontologies are not Enough for Knowledge Sharing. 12^^ lEA/AIE 
(LNAI v. 1611), 1999, Egypt:520-529, Springer- Verlag. 660 

4. F. S. Correa da Silva, R. C. Araujo, J. Agusti, and A. C. V. Melo. Knowledge 
Sharing Between a Probabilistic Logic and Bayesian Belief Networks. IPMU’2000 
(accepted). 660 



672 



David S. Robertson et al. 



5. T. Finin, Y. Labrou, and J. Mayfield. KQML as an Agent Communication Language. 
In J. M. Bredshaw, editor, Software Agents. AAAI Press/MIT Press, 1997. 663 

6. G. Gazdar and C. Mellish. Natural Language Processing in Prolog: An Introduction 
to Computational Linguistics. Addison Wesley, 1989. 670 

7. M. Genesereth. Knowledge Interchange Format. In Procs. 2nd National Conference 
on Principles of Knowledge Representation and Reasoning:599-600. Morgan Kauf- 
mann, 1991. 667 

8. Y. Labrou, T. Finin, and Y. Peng. Agent Communication Languages: The Current 
Landscape. IEEE Intelligent Systems^ March/ April:45-52, 1999. 661 

9. V. Nagamuta. Coordinating Mobile Agents Through a Broadcast Channel. MSc dis- 
sertation, Universidade de Sao Paulo (in Portuguese), 1999. 662 

10. D. Robertson, A. Bundy, R. Muetzelfeldt, M. Haggith, and M Uschold. Eco-Logic: 
Logic- Based Approaches to Ecological Modelling. MIT Press (Logic Programming 
Series), 1991. 663 

11. V. Vassalos and Y. Papakonstantinou. Expressive Capabilities Description Lan- 
guages and Query Rewriting Algorithms. Journal of Logic Programming^ 43:75-122, 
2000. 661 

12. G. Wickler. Using Expressive and Elexible Action Representations to Reason About 
Capabilities for Intelligent Agent Cooperation. PhD thesis. Division of Informatics, 
University of Edinburgh, 1999. 661 



Model-Based Control for Industrial Processes 
Using a Virtual Laboratory 



R.T. Bui', J. Perron^, and C. Pillion' 

^Universite du Quebec a Chicoutimi 

Chicoutimi, Quebec, Canada G7H 2B1 
2 

Alcan International Limited 
Jonquiere, Quebec, Canada, GTS 4K8 



Abstract. In the metallurgical industries, thermophysical processes are used in 
large numbers for the processing of materials in successive stages. Those 
processes are complex and they operate in hostile conditions and with poor 
accessibility. Model-based control in such cases is useful for designing and 
testing control strategies. The concept of virtual laboratory consists in combi- 
ning real and virtual processes with real and virtual controllers, these four 
elements communicate with one another locally or through the Internet. 
Researchers in the laboratory and operators on the plant floor can work together 
at a distance and in real time to solve process control problems by applying 
various control strategies and testing the solutions. This paper reports the work 
undertaken for setting up the virtual control laboratory (VCL) and gives an 
example of model-based control design carried out as an application of the VCL 
concept. The example is drawn from the adaptive control design of an 
aluminium casting furnace. 



1 Introduction 

The metallurgical industries are characterized by a great variety of thermophysical 
processes which require a wide range of control techniques. These processes are used 
in sequence in the successive stages of production where the output product of one 
process serves as input material for the next process further down the line. Each 
process is assorted with tight criterias and each intermediate product can have high 
added values providing that the process be properly conducted. Examples could be 
taken from the aluminium industry. Commercial alumina, if well processed, has a 
commercial value several times higher than that of the trihydrates. Aluminium, once 
transformed into metallic composites, yields an added value of a few hundred per 
cent. 

Those industrial processes are also characterized by a limited accessibility, due to the 
often very hostile environment in which they operate. Mathematical models which 
are run on computers can serve as process simulators - also called virtual processes - 
providing a risk-free alternative to the tests performed on the real process. With the 
advent of the new information and communications technologies people working with 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 671-680, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




672 R.T. Bui et al. 



virtual processes and those working on real processes can collaborate at a distance 
and in real time. The availability of powerful computers makes it possible not only to 
calculate at high speed but also to emulate the workings of the human brain, and thus 
enable us to apply to process control the many new techniques loosely regrouped as 
artificial intelligence (AI), from knowledge bases, expert systems, to fuzzy logics and 
neural networks or genetic algorithms. 

This paper describes the work done for the setting-up of a control laboratory that is 
partially virtual, dedicated to the study and implementation of control strategies for 
industrial thermophysical processes. Examples will be drawn from studies 
undertaken recently using the new laboratory's facilities, and promising research 
directions are suggested. 



2 The Virtual Laboratory 

The laboratory is built on four main components, namely the virtual and real process, 
the virtual and real controller. Three out of four components are located in the 
laboratory whereas the real process, which may be a lOOT aluminium casting furnace 
or a 200 ft. rotary coke calciner, stays where it belongs i.e. at the plant. All four 
components are linked together either through the local network or the Internet, thus 
allowing researchers, managers, operators to work together ignoring the geographical 
distance. 



Fine tuning^ with 
On-line learning of 
Analysis of Qusim 
Updating and nnodi 
improvement of^. 




( P' 

V (Pr( 



(Process Simula 



VIRTUAI 

PROCES 



O 



1 



I 



I 




Fig. 1. The basic concept of the virtual control laboratory 



Model-Based Control for Industrial Proeesses Using a Virtual Laboratory 673 



This concept of virtual laboratory for process control is illustrated in Figure 1 and 
presented in some detail in Reference [1]. The virtual process (the process simulator) 
and the virtual controller (also called the control emulator) are computer programs 
running on computers. As for the real controller it is in the form of hardware, mostly 
programmable logic controllers (PLC). Any two of the above four components of the 
laboratory can be put to work together for the purpose of developing, testing, 
analyzing a control logic or a control strategy, prior to applying it to the real process 
through the real controller. Figure 1 shows a number of possible applications through 
the linking of those components. As a general rule, the applications shown at the top 
of the figure involve less risk than those shown at the bottom; the ultimate stage, 
clearly, is the coupling of the real controller with the real process to make it work. 

Thus, coupling the virtual process with the virtual controller (link 1-2) allows us to 
perform the testing of new control strategies without risk, carry out parameter studies, 
select the best dynamic response, or analyze the sensitivity of the various process 
variables. Evidently, as a prerequisite, the representativity of both the virtual process 
and the virtual controller must have been previously verified through calibration and 
validation. 

A coupling of the virtual process with the real process (link 1-4) can be done through 
proper interfaces, input-output channels and analog-digital converters. Such a 
coupling allows us to fine-tune the virtual process using data from the real process, or 
to analyze the real process using its virtual counterpart as a tool, or to update and 
improve the latter based on data coming from the former. This last application is 
particularly meaningful when we consider that with time and aging, the real process's 
parameters may change, and such change must be reflected by its simulator. 

Still another example of coupling of components can be seen when the real controller 
is allowed to act on the virtual process (link 1-3). This helps us test new control 
strategies - implemented in the real controller - and their dynamic effects on the 
virtual process. This step is useful and even necessary prior to the final stage of 
applying the real control to the real process. 

At this stage two systems have been selected for implementation in the virtual 
laboratory, namely the control of an aluminium casting furnace and that of an 
aluminium electrolytic cell. They were chosen for their general representativity of a 
large number of processes used in the aluminium industry. A detailed description of 
the casting furnace or the electrolytic cell is not within the scope of this paper. Such 
information can be found in [2] [3] for the furnace or in [4] for the cell. The casting 
furnace's control system is presented hereunder. 

Figure 2 gives the general arrangement of the linking of the four components 
mentioned previously. The real process which is located not in the laboratory but in 
the plant, is represented here by its operator interface which constitutes the outermost 
gate for the outside world to look into the system. Attention has been paid to the 
selection of the hardware to be used for the operator interface, and even the 
formatting of its frontplate; the purpose is to have the best possible replica of the real 
configuration used in the plant. The real controller is implemented in the PLC's. The 
virtual controller is nested in a computer where 




674 R.T. Bui et al. 



Interface for 

programming Interface for control 





Pentiumll 350Mhz computer 



Control Interface InTouch 



Programming software PanelMate 



Programming software Concept 
Programming software Taylor 



Programming software InControl 



Analysis and data acquisition software 
Expertune. 




Pentiumll 450Mhz computer 



Control Interface InTouch 



Programming software PanelMate 



Programming software Concept 
Programming software Taylor 



Programming software InControl 



Analysis and data acquisition software 
Expertune. 



The real controller 
(PLC's) 




1 



CJ 



Operator Interface 




j 



The virtual furnace 





Modbus plus communication with PLC in real time. 
Ethernet communication, sending of program to PLC. 
Modbus data acquisition. 

SuiteLink communication protocol from Wonderware. 
I/O Cabling, D/A. 



Fig. 2. Networking of the main eomponents of the virtual eontrol laboratory 

programming software of various kinds is used to program both the virtual and the 
real controller. Here again, special care is taken to apply the same software as that 
commonly used in the aluminium industry for programming the control logics, the 
control interface and the operator interface. Still other elements of the software are 
used to enable an access to the control system via the Internet, using passwords and 
other means to ensure privacy whenever an industrial application so requires. 
Operational events are handled in real time by standard query language, and 
communication between the operator interface and the PLC's is established in 
continuous dialog and in real time. 

In terms of programming, clearly the most complex component is the process 
simulator, i.e. the virtual process. This is expected, because in the first place, it is the 
complexity of the process that is at the origin of strict requirements in control. 








Model-Based Control for Industrial Proeesses Using a Virtual Laboratory 675 



Experience shows that a user-friendly, menu-driven, graphic interface is a useful 
addition to the virtual process. Such an interface becomes even necessary when we 
take into account the fact that the process simulator gathers its input data from data 
files, and changes made to such data must be done through a text editor. Therefore 
there is no validation of the input data, and if for some reason the files format is 
changed, the simulator will not be able to read the data correctly. 




(T) ActiveX for the representation of the casting furnace 
@ Windows for events 
@ Initialization of the simulator's variables 
(4) Execution 
® Graphic animations 

Fig. 3. Graphic interface of the virtual fiimaee 

A view of the graphic interface for the virtual casting furnace is shown in Figure 3. It 
has been built from the same blueprints of the furnace used in the plant, so as to keep 
a familiar environment for the operators. Here a cross-section of the furnace is seen, 
with the gas body on top of the liquid metal. The burner and part of the solid metal 
are also shown. 

A modular programming was applied, using graphic modules that can be modified at 
will, and modifiable windows for writing the input files. As for the output windows 
they remain unchanged. Menus are simple and commands are intuitive. Only a small 
number of buttons are used and they are all visual. This contributes to minimizing the 
learning effort. Finally, it is important to note that the graphic interface is the only 
point of contact between the virtual furnace and its user, and no other direct contact 
between the two is possible. 







676 



R.T. Bui et al. 



3 An Application Example 

Although the two-year program for the setting-up of the laboratory is still underway, 
application examples have been carried out involving the virtual casting furnace and 
its controller. The purpose was to verify the validity of the VCL concept and to 
explore avenues of research in process control for the new laboratory. 

In the following example, the virtual furnace was activated, various control strategies 
were applied to it, and its responses were analyzed. 

The starting point for building the virtual furnace was the mathematical model of the 
aluminium casting furnace [2] [3]. The mathematical model was the result of an 
elaborate project including a tedious exercise in model validation using plant data. 
The virtual furnace built from that model is a dynamic simulator in one dimension, 
the dimension is the depth of the metal and the thickness of the refractory roof of the 
furnace chamber. It takes into account all the successive phases of the operation of 
the furnace : loading, heating, stirring, alloying, skimming, fluxing, second heating... 
The energy equation is solved in one dimension for each component of the furnace : 
the roof, the gas body, the liquid metal, the solid metal, the floor and the insulation. 
The melting of metal is solved by the enthalpy method coupled with the Kirchoff 
transform of the thermal conductivity. The momentum equation is not solved for the 
liquid metal, as natural convection plays a minor role. On the other hand the forced 
convection coming from stirring is important. It causes a destruction of the vertical 
gradient in the metal temperature, and this is represented by a correlation calibrated 
experimentally. 

The partial differential equations are solved by a finite-difference method and the 
program is written in Fortran. 

The number of design and operating parameters involved is in the order of 450, which 
is another indication that it is difficult to make decisions based on intuition and 
experience alone. This stresses the need for model-based process analysis and 
control. 

The virtual controller (the control simulator) offers three options, the most advanced 
is the one with conventional PID feedback, in which the fuel flow and the combustion 
air flow follow a feedback control based on the difference between set-point metal 
temperature and actual metal temperature. 

It was observed, both in the plant and in laboratory simulations, that such a scheme 
yields a fuel control close to an on-off behavior accompanied by a metal temperature 
overshoot, which operators want to avoid or minimize. 

The control simulator was then modified to change from conventional feedback 
control to adaptive control, using as adaptive parameter the temperature of the hot 
reverberatory roof of the combustion chamber. Then, instead of the conventional 
scheme of Figure 4, the adaptive control scheme of Figure 5 was used. The roof 
temperature comes from simulation results given by the virtual furnace, whereas the 
parameters Pm and Pa are obtained through trials-and-errors. 




Model-Based Control for Industrial Proeesses Using a Virtual Laboratory 677 



The equation giving P in the upper block of Figure 5 shows that if Pm is set at a high 
value, the system moves toward an on-off control, and if the value of Pa increases, the 
system moves closer to an adaptive control. 

Figures 6 to 8 present the results of simulations using the virtual furnace and the 
virtual controller. In Figure 6, Pm is set at 25 and Pa at 0.4. The curves show a more 
gradual decrease of fuel flow (curve #4), as compared to the base case where 
conventional feedback was applied (curve #5). Fuel is cut off when metal 
temperature (curve #1) reaches the 760°C set point. The figure also shows that the 
metal temperature takes more time to reach its set-point value but the overshoot is 
smaller than in the base case (curve #2). 

Figure 7 corresponds to a higher Pm value of 35, while Pa is kept at 0.4. As expected, 
the system moves closer to on-off control and this can be seen by comparing the two 
fuel flow curves (#4 and #5) or the two metal temperature curves (#1 and #2) which, 
in this case, practically coincide. 

In Figure 8, Pm is set at 25 while Pa is reduced from 0.4 to 0.35. A comparison with 
Figure 6 shows that the system is now close to adaptive control but the adaptive effect 
is diminished due to a smaller value of Pa- 



Tset 








p 




Virtual 






1 




Furnace 












Fig. 4. The conventional control scheme. 

Tset = metal temperature set point; Tmet ^ metal temperature measured at thermocouple 




Fig. 5. The adaptive control scheme. 

P = proportional gain; Pm = maximum value for the proportional gain (value by 
default); Pa = weighting factor for the difference between roof temperature and set 
point metal temperature; Tmof = temperature of the reverberatory roof of furnace. 



678 R.T. Bui et al. 




10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 

Time (seconds) 



-Tbi Base case #2 



-oP #3 »NC #4 NC Base case ^ | 



Figure 6: Adaptive control versus conventional control (T61=temperature of liquid aluminium 
in ®C, NC=fuel flow in m3/h). Pm=25, Pa=0.4, SP=set point 760^C. 



Fig. 6. Adaptive control versus conventional control 

P„ = 25,Pa = 0,4 




Time 

— TBI#] T61 Base case #2 NC #4 NC Base case | 



Figure 7: Adaptative control versus conventional control. Pm=35 Pa=0.4, 
SP=set point 760°C. 



Fig. 7. Adaptive control versus conventional control 

Pm = 35, Pa = 0,4 



Model-Based Control for Industrial Proeesses Using a Virtual Laboratory 679 




Fig. 8. Adaptive eontrol versus eonventional eontrol 
25, Pa = 0,35 



4 Conclusion 

A laboratory has been set up for the design of industrial process controls with 
emphasis on the metallurgical processes. The main components, virtual and real, are 
operational, and application examples have been carried out to illustrate the concept 
of virtual laboratory. The example shown on the adaptive control design of an 
aluminium casting hirnace illustrates the feasibility of using the components of the 
laboratory to perform the analysis and optimization of process control systems. Tests 
have now been successfully conducted for the networking of the laboratory 
components through the Internet, including security measures such as passwords, 
qualified access, coding and decoding of information and data, error detection and 
correction. Once operational, this networking will allow the calibration of virtual 
processes on real processes located at faraway plants as well as the test of new control 
schemes using the virtual and real components of the laboratory. 



Acknowledgements 

The authors are indebted to the Canadian Foundation for Innovation (CFI), the 
Ministry of Education of Quebec (MEQ) and the Centre Quebecois de Recherche et 
Developpement en Aluminium (CQRDA) for the funding of the infrastructure of the 



680 R.T. Bui et al. 



new Virtual Control Laboratory (VCL). The industrial support from Alcan's Arvida 
Research and Development Center (ARDC) is deeply appreciated. The authors 
acknowledge the valuable contributions to the setting up of communication and 
control systems at the VCL by S. Doyon and D. Desrosiers (real controller and 
graphic interface) and V. Villeneuve (communications and data processing). 



References 

1. Bui, R.T., Tikasz, L. and Perron, J. :. "Trends in intelligent process control 
methods in the primary aluminum industry.". Proceedings of the Second Int. 
Conf. on Intelligent Processing and Manufacturing of Materials, IPMM'99, J. A. 
Meech et al. Eds., Honolulu, (1999) 749-754. 

2. Perron, J. : "Modelisation mathematique simplifiee d'un four de metal chaud.". 
Master of Engineering Thesis, Universite du Quebec a Chicoutimi, (1987) 201 
pages. 

3. Bui, R.T. and Perron, J. : "Performance analysis of the aluminum casting 
furnace.". Metallurgical Transactions, (1988) 19B, 171-180. 

4. Tikasz, L., Bui, R.T. and Potocnik, V. : "Aluminum electrolytic cells : a 
computer simulator for training and supervision.". Engineering with Computers, 
(1994) 10, 12-21. 




Autonomous Agents for Distributed Problem 
Solving in Condition Monitoring 



E.E. Mangina*, S.DJ McArthur, J.R. McDonald, 
Department of Eleetronie & Eleetrieal Engineering, 

Centre for Eleetrieal Power Engineering 
University of Strathelyde 
Seotland, U.K. 

* Corresponding author: eleni.mangina@strath.ac.uk 



Abstract: The applieation of intelligent systems for data interpretation and 
eondition monitoring is an advaneing field of researeh. In reeent years 
autonomous intelligent agents and multi-agent systems have gained mueh 
attention within different real time applieations. This paper introduees the novel 
idea of COMMAS (Condition Monitoring Multi -Agent System); a hierarehieal 
deeentralised multi-agent arehiteeture developed for data interpretation and 
eondition monitoring applieations. It employs groups of different kinds of 
intelligent agents to eope with the variety of applieation funetions by using 
distributed problem solving and different eomputational intelligenee teehniques. 
The design and funetionality of the diversity of agents, along with the key issues 
of the multi-agent system as a whole are deseribed. This paper demonstrates how 
agent teehnology overeomes problems assoeiated with eentralised approaehes in 
eondition monitoring, and illustrates the new opportunities agents ean provide. 



1 Introduction 

A variety of intelligent techniques have been applied in plant monitoring, which resulted 
in the development of centralised approaches for condition monitoring, e.g. Knowledge 
Based Reasoning (KBR) Systems [1], Model Based Reasoning (MBR) Systems [2], 
Case Based Reasoning (CBR) Systems [3], Artificial Neural Networks (ANN) [4] etc. 
By definition condition monitoring is concerned with detecting and distinguishing faults 
occurring in plant that is being monitored [5], therefore the early diagnosis and 
identification of faults has a number of benefits (improvement in the plant economy, 
reduction in operational costs, improving the level of safety etc). COMMAS addresses a 
new area in intelligent plant monitoring as it supports the use of more than one 
computational intelligence technique through agent technology, in order to interpret the 
plant data and derive meaningful conclusions. 

The objective of this work is to improve the accuracy of present systems by 
taking the next step and promote a decentralised and distributed intelligence approach 
for condition monitoring. The proposed framework will support data fusion, cross sensor 
corroboration and decision support functions through intelligent system based data 
interpretation. The intelligent agents within COMMAS are computer systems capable of 
flexible autonomous actions. They are designed to replicate the diagnostic tasks 
performed by engineers, while allowing co-operation and exchange of information. The 
total expertise is distributed among all the agents, each having only partial knowledge of 
the complete problem to be solved. Different teams of these agents compose the multi- 
agent system, which is able to work in a dynamic environment (as the state of the 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 683-693, 2000. 
© Springer- Ver lag Berlin Heidelberg 2000 




684 



E.E. Mangina 



monitored plant might change over time). As a result, it will have the ability not only to 
infer the state of the plant, but also to predict when a serious condition or failure may 
occur. 

This paper focuses on the functionality of the multi-agent system, which requires 
groups of application specialised autonomous agents, as well as on a number of research 
issues derived through this framework, e.g. communication, co-ordination, evaluation 
and learning abilities of agents. 

2 Condition Monitoring: From a Centralised to an Agent Based 
Approach 

A key function of intelligent condition monitoring is data interpretation. This has 
traditionally been the domain of recognised power engineering specialists. The use of 
structured knowledge elicitation and modelling techniques has led to the creation of 
condition monitoring and diagnostic knowledge bases. These have been implemented as, 
for example knowledge based condition monitoring systems [6]. The general structure of 
the automatic monitoring systems traditionally developed, is given in Figure 1, where the 
sensor data is being processed from an intelligent system, which employs a specific 
computational intelligence technique to derive a meaningful conclusion or diagnosis. 




Fig 1. Condition Monitoring Traditional 
Intelligent System 




Fig 2. Condition Monitoring Multi- 
Agent System 



In contrast with this centralised approach COMMAS, as shown m Figure 2, is not a 
stand-alone intelligent condition monitoring system. It is a hierarchical layered approach, 
where the intelligent reasoning is distributed to each layer. The intelligent agents have 
been grouped into three categories, based on the distribution of tasks following the 
philosophy of condition monitoring [7]. These agents can then make use of intelligent 
data interpretation techniques, such as KBR, CBR, MBR etc. Such an approach is in 
contrast to most recent approaches of agent-based applications, which have focussed on 
single agents with general capabilities to perform a wide range of user-delegated 
information finding tasks [8]. These have several limitations: the need for a vast amount 




Autonomous Agents for Distributed Problem Solving in Condition Monitoring 



685 



of knowledge in order to provide eoverage for a variety of tasks; the inability of most 
single agents to deal dynamieally with the appearanee of new agents and information 
sourees. Within COMMAS, these problems will be overeome, as agents will aeeess, 
filter, evaluate, interpret and eombine this information. 

The multi-agent system proposed in this paper ean work with different tasks, 
organise itself to avoid proeessing bottleneeks, and eope with dynamie ehanges in the 
agent and information- souree landseape. Within COMMAS the data interpretation 
hmetionality is the result of agent eommunieation and eo-operation between and within 
the three different layers in eonjunetion with underlying intelligent system teehniques. 

3 COMMAS: Application and Intelligent Functionality 

COMMAS is a multi-agent system, where the responsibilities have been distributed, 
between the agent eomponents, based on the applieation funetions of the eondition 
monitoring proeedure. As shown in Figure 3 a, the first layer of agents (Attribute 
Reasoning Agents) monitor and interpret sensor data in order to deteet any signifieant 
deviations from expeeted behaviour. Then eross sensor eorroboration will take plaee at 
the seeond layer (Cross Sensor Corroboration Agents - CSCA), based on the results from 
the previous layer. The final layer of agents (Meta Knowledge Reasoning Agents) will 
interpret the results of CSCA to identify plant faults and provide engineering diagnoses. 
Henee evaluating the state of the plant. In order to aeeomplish all these tasks (inferenee 
intelligently), eaeh agent has to employ one (or more) reasoning meehanisms. Figure 3b 
eomprises the required intelligent funetions for agents’ reasoning (and learning) abilities, 
their goals, beliefs, skills ete. Every agent is developed as a soeial entity, whieh is 
implemented by the inter-agent eommunieation (using eommunieation protoeols). 

The software eomponents within COMMAS employ the entire basie agent 
eharaeteristies that have been defined (autonomy, reaetivity, pro-aetiveness and soeial 
ability) [9]. The three distinet groups of agents that are of use within COMMAS, along 
with the eommunieation issues between them, and agent identifieation, are deseribed in 
the following subseetions: 

3.1 Attribute Reasoning Agents (ARA) - Situated Agents 

Independent software modules monitor eaeh sensor on the plant and give information 
about the attribute they monitor (i.e. temperature, pressure ete.). They emulate the steps 
that are followed in eondition monitoring, where eaeh speeialist has explieit knowledge 
about eaeh sensor of the plant, with only a partial view of the whole plant eorresponding 
to their area of expertise. In terms of agent identifieation they are situated agents, whieh 
were first introdueed by Pattie Mayes (1991) [10] and have to be autonomous software 
agents that are situated in the (software) environment, but without having to be aware of 
all aspeets of reasoning aetivity. They operate based on eertain funetions, exploiting their 
own methods of reasoning. Although they do not have agent wide eommunieation skills 
and they are foeused on their proeess, their results are not loeal; they are sent to other 
agents, whieh ean then exploit them. 

In this approaeh, ARA intelligent agents with loeal problem solving teehniques 
will perform the interpretation of plant sensor data and assoeiated information. Their 
responsibilities, both in terms of applieation speeifie and intelligent - inferenee 
hinetions, are shown in figures 3 a and 3b respeetively. Their proeesses will be based on 




686 



E.E. Mangina 



the constraint limits that characterise each attribute (i.e. the rate of changes, the upper 
and lower limits etc.). The reasoning can be based on artificial intelligence techniques 
that have already been used in the field, like those employed by knowledge based 
systems. Then their results will be forwarded in messages to the CSCA, to inform them 
of any attributes’ changes and by how much. During the sensor data processing, the 
ARA have to keep track of the attributes’ changes during time, in a database. These files 
and the files that contain information about the attributes’ constraints are of restricted 
access only to ARA and MKRA, in case they want to reason based on Case Based 
Reasoning (CBR), or update the limits of the attributes, if it is required in the future. 



Inform 




Fig 3a. Domain functionality in COMMAS 



3.2 Cross Sensor Corroboration Agents (CSCA) - BDI Agents 

The COMMAS architecture is designed to permit co-operative and competitive problem 
solving among agents in order to resolve uncertainties and conflicts. This is due to the 
diversity of the data being monitored on the plant. The ARA first identify abnormalities. 
The CSCA simulate the procedures and reasoning used for Cross Sensor Corroboration 
by experts. The solution produced as a result from the inter agent communication at this 
layer, will finish with conclusions about the relationships between the attributes of the 
plant (i.e. temperature - pressure). The interaction at this stage is based on an iterative 
communication between the agents, which involves sending and receiving messages as 
well as the creation of hypotheses, with different results depending upon the 
relationships between pairs of attributes, and based on agents’ beliefs, desires, intentions. 
Rao & Georgeff (1995)[1 1] first presented agents that have an internal representation of 
their own beliefs, desires and intentions (BDI) based on the facts of the environment 






Autonomous Agents for Distributed Problem Solving in Condition Monitoring 



687 



they are working in at a eertain time. In COMMAS, CSCA are BDI agents, whieh have 
to work with temporal logie, as they eommunieate and exehange messages eoneeming 
their internal reasoning, beliefs, desires, intentions, plans and goals, whieh will have to 
be represented in terms of possible worlds over different and/or the same time intervals. 
Due to the tasks these agents have to aeeomplish (Figure 3a and 3b), they ean use KBR 
(based on knowledge retrieved from the experts); artifieial neural networks (ANN) 
(whieh will be framed on historieal data to eorrelate sensor data with possible problems) 
and CBR (referring to past eases of relationships, by using the database, where past 
attribute relationships and sensor faults have been stored). The final result at this layer 
will be the determination of whether there is a sensor failure, and the relationships of 
attributes that eaused it. CSCA ean justify their eonelusions based on their beliefs, goals, 
and intentions over time. These final eonelusions will then be sent to MKRA, where the 
manipulation of this information related to the plant takes plaee, and if the MKRA notiee 
any important ehange, they ean modify the CSCA’s a priori knowledge about the state of 
the plant and the attributes. 




p 

L 

A 

N 

T 



s 

E 

N 

S 

o 

R 




A 

R 

A 



- Data 
processed 
by KBR, ^ 
CBR etc. 

- Internal 
representa- 
tion 

(constraints 
of the 
attributes) 

- Communi- 
cation skills 

- Learning 
abilities 
-Assistance 
to the ^ 
operator 





- Knowle- ^ 




dge of the 




structure of 




the plant 


c 


- Inference 




with KBR, 


s 


ANN, CBR 


c 


etc. 




- Query 


A 


processing. 




Communi- 




cation skills 




- Interpret 




and identif^^ 




real faults 



M 

K 

R 

A 



Inform 
other layers 



- Inference 
with KBR, 
MBR, CBR, 
ANN etc. 

- Plant’s state 
evaluation 

- Management 

- Control 

- Plan 

- Decision 
functions 

- Diagnosis 

- Explanation 
& learning 
abilities 



Fig 3b. Inference functionality in COMMAS 



3.3 Meta-Knowledge Reasoning Agents (MKRA) 

These are novel agents that are being developed within COMMAS to perform high-level 
diagnostie, management and eontrol tasks in the eomplex dynamie environment of the 
whole software system. They are of erueial importanee, as they supervise the software 
system and they generate the final eondition monitoring and diagnostie eonelusions. 
From the problem solving point of view, the distribution requires the deeomposition of a 





688 



E.E. Mangina 



problem into a set of subproblems and then eomposing their solutions. The eolleetion of 
the results from the proeesses at the previous layers takes plaee at the MKRA layer. For 
the CSC A layer, there are many interdependeneies among their subproblems, and there 
eould also be some eonflieting results. The manipulation and evaluation of the given 
results, the final eonelusions, management and supervision of the agent soeiety as a 
whole, is done at the final layer for the group of MKRA, whieh have knowledge about 
the system’s state. They try, with eo-ordination of the lower layer agents’ aetions and 
planning, to maximise the use of the eomponents of the system and end up with a 
reliable eonelusion (diagnosis of the plant). For example they ean use ANN to prediet 
the normal state of the plant in the future, and Model Based Reasoning (MBR) to 
diagnose any real sensor or plant faults (Figure 3a). Apart from these tasks the MKRA 
have also to aeeomplish management tasks for the agent soeiety in the software system 
as a whole (i.e. eommunieation, planning, eo-ordination of tasks, eontrol ete.) as shown 
in Figure 3b. These funetions are a novel idea, beeause after the diagnosis, they feedbaek 
to the previous layers by sending messages to the agents (ARA and CSC A) with 
information about what eaused the fault (either an attribute ehange or a eertain attribute 
relationship). In this way their databases with historieal data will be upgraded and the 
eonstraints’ files ean be ehanged. Only the higher layer has an overall view of the eurrent 
status of the plant, as the final diagnosis takes plaee. The previous layers filter and 
interpret the data from the plant and identify the important information. 

3.4 Benefits of Multi-agent Systems in Condition Monitoring 

COMMAS has been designed based on interaetive systems eomposed of self reliant and 
independent subsystems (agents). The distributed software agents will be developed and 
tested independently of the others. They will be able to funetion effeetively, due to the 
eommunieation skills, even though they have an ineomplete view of the problem (ARA 
and CSCA). The overall solution will be given from the MKRA, after the aggregation of 
the partial solutions in a dynamie fashion (due to the interaetions) from the lower layers . 
The system “knows” what has been aehieved and what has to be aeeomplished, based on 
its own predietions and the past proeesses. If a ehange oeeurs in the future and the 
software agents ARA and/or CSCA in the lower layers have to modify their proeesses, 
they have to be informed by the MKRA. This will make the system adaptable, flexible to 
sudden ehanges of the monitored plant and easily extensible, as more agents ean be 
added to the soeiety of COMMAS if it is needed. This work developed to be applied in 
power engineering applieations, but it ean be applieable in many different areas 
eoneerning eondition monitoring. 

4 Implementation Issues 

In the past, systems that have been built for eondition monitoring were employed as 
stand alone data interpretation proeesses, whereas distributed engineering systems have 
shown effeetive result [12]. Instead of gathering the data and automatieally proeessing 
and extraeting the information, this paper presents a distributed system within an 
organisation of agents. Agents are eomputer systems eapable of flexible, autonomous 
aetion. Their teehnology has been used to supply information required for making 
various types of deeisions in several projeets already [13] [14]. In this projeet as more 
sensor data beeomes available at different times and from multiple loeations, it beeomes 



Autonomous Agents for Distributed Problem Solving in Condition Monitoring 



689 



difficult and time consuming for a person to collect and evaluate them, in order to infer 
the state of the plant. COMMAS covers all the concepts of a multi agent system [15], 
uses a number of different reasoning mechanisms (described in the previous section), 
and generates a meaningful interpretation of the sensor data, by communicating the 
partial knowledge that the software agents have stored from the monitored plant. This 
allows the system to deal with incomplete and inconsistent data during the process of 
condition monitoring. 

COMMAS is a prototype multi agent based system, which is being developed 
to provide the communication framework between the agents, composed of a set of 
layered protocols, following the proposed three layered approach. It has made use of the 
Java Development Kit available from Sun Microsystems, inc. [16], within the Java 
Agent Template (JAT) environment. The JATLite package (available from the Agent 
Based Engineering Group, which is a part of Stanford University’s Centre for Design 
Research) is a package of programs written in the Java language which allow users to 
create software agents that communicate through the Internet [17]. This allows users to 
create agents able to run on different platforms. The template that is provided from 
JATLite allows the users to develop their own intelligent functions. All the agents have 
to be registered to the Router (agent provided from JATLite), so that in case one of them 
“wants” to communicate with another one, it has just to refer to its name and the 
message will be sent. In the case that the receiver agent is disconnected the Router will 
store the incoming messages until it reconnects [18]. The communication capabilities are 
provided through the Router, which controls interaction in a multi-agent environment, 
and Java classes, which control registration and connection to the Router. The router is 
appropriate for the prototype system of this work, while the underlying JATLite 
hmctions will allow COMMAS’ agents to access the appropriate data and information 
sources, as shown in figure 4. 




Fig 4. JATLite router 

Within the system there are issues that have to be addressed, as it deals with a society of 
software agents, where tasks have to co-ordinated [19]. Agent communication will help 
to choose temporally ordered actions (especially MKRA), either by making schedules 





690 



E.E. Mangina 



with plans and commitments, and/or following explicit rules of social behaviours. In 
order for the agents to communicate, JATLite supports a standard agent communication 
language (ACL). The Knowledge Interchange Format (KIF) is the first aspect of a 
common language of interchanging knowledge among disparate program [20] and has 
been used within the Knowledge Query and Manipulation Language (KQML) [21]. 
KQML is a message format (and message handling protocol) which is designed to 
support run-time knowledge sharing among agents. KIF, KQML and other associated 
standards are designed to simplify the communication between agents, therefore solving 
problems related to interactions and co-operation. An example of the communication 
through the agents in COMMAS using JATLite and how it makes use of this ACL 
standard as shown in Figure 5. 




Fis 5. Examnle of KOML messages within COMMAS 

This figure shows an example, of how the agents at the final layer (MKRA) can derive a 
sensor failure from combining the results of the agents at the previous layers. It gives a 
part of the monitoring procedure of a gas turbine using COMMAS. The agent 
monitoring the temperature informs the agent CSCA T - P, who is responsible for 
deriving a conclusion about the relationship between the temperature (T) and the 
pressure (P), that at time tl the T is high. The CSCA T - P then asks the agent CSCA P- 
CO 2 (who is responsible for deriving to a relationship between the pressure and the 





Autonomous Agents for Distributed Problem Solving in Condition Monitoring 



691 



Carbon dioxide) what is the state of the pressure at time tl. The information that the 
pressure is normal is given from the agent that monitors the pressure. At the seeond layer 
the rest of the CSCA are eommunieating in a similar way and they reason logieally to 
develop a eomparison of behaviour between the measured attributes, as shown in Table 
1: 



Table 1. Results from CSCA 





T 


P 


CO2 


CSCA T - P 


High 


Normal 


- 


CSCA P-CO2 


- 


Normal 


Normal 


CSCA T-CO2 


High 


- 


Normal 



At the final layer the MKRA speeialised in diagnosis, based on the above results and 
reasoning with different eomputational intelligenee teehniques (i.e. CBR), informs the 
user that there is a sensor fault at time tl, from the temperature sensor. 

5 Conclusion 

The philosophy of this work is based on a dynamie multi-agent software system, whieh 
employs eommunieation skills, with deeision-making funetions for data interpretation in 
eondition monitoring. In the past, a number of stand-alone intelligent eondition 
monitoring systems have been eonstrueted, all of whieh employ some intelligent system 
teehnique for data interpretation. COMMAS is a federated system, whieh takes the next 
step and promotes a deeentralised and distributed intelligenee approaeh to intelligent 
eondition monitoring, through the use of a multi-agent approaeh, whieh eould use the 
eurrent systems [22]. 

The novel idea of this framework is the eonstruetion of an agent soeiety in 
groups, based on the hierarehieal reasoning during the eondition monitoring proeedure. 
The whole proeedure of eommunieation between these different kinds of agents also 
offers at the end the positive result of improved interpretation, a flexible arehiteeture and 
adaptability. This is beeause the eondition monitoring problem has been mapped to a 
‘eommunieation’ problem where different kinds of agents seek to exehange messages to 
intelligently interpret data on an on line basis. 

In terms of implementation issues, JATLite is one approaeh, but other will be also 
evaluated in the future. COMMAS will be used to interpret data of real applieations of 
power engineering and be a valuable tool to improve the performanee of stand alone 
intelligent systems that exist in the field. Finally, the system will be able to be applied in 
any other applieation related with data interpretation issues in eondition monitoring. 

6 References 

1. B. D. Gemmell, J. R. MeDonald, R. W. Stewart, R. N. T. Brooke and B. J. 
Weir, (1994), “A Consultative Expert System for Fault Diagnosis On Turbine 
Generator Plant”, Institution of Meehanieal Engineers’ Proeeedings Part A, 
Journal of Power and Energy, Deeember 1994. 

2. S. D. J. MeArthur, S. C. Bell, J. R. MeDonald, R. Mather and S. M. Burt, 
“Knowledge and Model Based Deeision Support for Power System Proteetion 
Engineers”, Proeeedings of the International Conferenee on Intelligent Systems 
Applieations to Power Systems, ISAP’96, pp.2I5-2I9. 




692 



E.E. Mangina 



3. J. R. McDonald et al, (1997), “Intelligent knowledge based systems in electrical 
power engineering”, London: Chapman & Hall, ISBN 0412753200. 

4. C. Booth, J. R. McDonald, W. Hagman, (1995), “The use of Artificial Neural 
Networks for the Prediction and Classification of Vibration Behaviour in Plant 
Transformers”, American Power Conference, Chicago. 

5. J. H. Williams, A. Davies and P. R. Drake, (1992^, Condition - Based 
Maintenance and machine diagnostics, CHAPMAN & HALL. 

6. A. Moyes, G. M. Burt, J. R. McDonald, J. R. Capener, J. N. Dray and R. 
Goodfellow, “Combining design and operational knowledge to enhance 
generator plant diagnostics”, lEE Proc. - Gener. Transm. Distrib. Vol. 143, No 
3, May 1996, pp. 300-304 

7. R. Barron, (1996), Engineering Condition Monitoring: Practice, Methods and 
Applications, Addison Wesley Longman. 

8. S. M. C. Peers, “Knowledge Representation in a Blackboard System for Sensor 
Data Interpretation”, Methodology and Tools in knowledge based systems, vol. 
1, 1998, pp.657-666. 

9. Wooldridge M. J. & Jennings N. R., (1998), Agent Technology Foundations, 
Applications, and Markets, Springer- Verlag. 

10. Pattie Maes, (1991), “Designing Autonomous Agents”, Theory and Practice 
from Biology to Engineering and back, the MIT Press. 

11. A. S. Rao and Georgeff, (1995), “BDI agents: From theory to practice”. Tech. 
Rep. 56, Australian Artificial Intelligence Institute, Melbourne, Australia. 

12. R. Mather, T.Cumming, S.D.J. McArthur, S.C.Bell and J.D.McDonald, “The 
development of an advanced suite of data interpretation facilities for the 
analysis of power system disturbances”, Cigre 1998, Group 34, paper 34-102. 

13. D. Cockburn and N. R. Jennings, (1996) “ARCHON: A distributed artificial 
intelligence system for industrial applications”. Foundations of Distributed 
Artificial Intelligence, John Wiley & Sons. 

14. Jennings et al, (1996), ADEPT: Managing Business Processes using Intelligent 
Agents, Conference Proceedings BCS Expert Systems 1996. 

15. Ferber Jacques, (1999), Multi Agent Systems: An Introduction to Distributed 
Artificial Intelligence, ADDISON-WESLEY. 

16. The source for Java technology, http://java.sun.com/products/index.html 

17. JATLite, http ://j ava. standford.edu 

18. JATLite Router, http://www.cdc.unict.it/~michele/JATLite/docs/ 

19. E. H. Durfee, “Practically Co-ordinating”, Al Magazine, vol.2 20, pt. 1, 1999, 
pp. 99-116. 

20. UMBC AgentWeb, KIF: Knowledge Interchange Format 
http://www.cs.umbc.edu/kse/kif/ 

21. Finin T., Labrou Y. & Mayfield J., KQML as an agent communication 
language, in Bradshaw Jeffrey, (1997), Software Agents. 

22. Jennings et al, (1993), Transforming Standalone Expert Systems into a 
Community of Cooperating Agents, International Journal of Engineering 
Applications of Artificial Intelligence Vol. 6, No 4. 




Modeling Issues for Rubber-Sheeting Process in 
an Object Oriented, Distributed and Parallel 

Environment 



Frederick E. Petry and Maria J. Somodevilla 

Department of EECS, Tulane University, New Orleans, LA, 70118 
petry , somodeviOeecs . tulane . edu 



Abstract. The rubber-sheeting issues for GIS conflation are assessed in 
this paper. This work is an early step in the process of defining an inte- 
gration methodology of geospatial data from multiple sources. We based 
on an improved algorithm for rubber-sheeting in an 00, distributed and 
parallel environment. The proposed framework is motivated in previ- 
ous works in parallel virtual machines and mobile agents. The critical 
issues that arise from this assessment will be then utilized in the rubber- 
sheeting prototype development. 

Keywords: conflation, distribution, mobile- agents, 00, parallelism, 
rubber-sheeting 



1 Introduction 

In this paper we will survey the background of rubber-sheeting for GIS conflation 
and then provide an overview and assessment of their modeling issues. This 
assessment provides us the focus on the critical issues needed for prototype 
development. GIS deal with spatial information which is commonly represented 
as a map. Since exist different maps of the same geographic area representing 
different themes it is necessary to combine the information from them to obtain 
a more comprehensive map. This process is referred to as conflation. The goal of 
this work is to model the rubber-sheeting problem. To achieve our goal we focus 
on distributed aspects of conflation by combining concepts from 00, distributed 
and parallel paradigms. Distributed computing have allowed the representation 
and manipulation of real world objects geographically distributed but in a static 
way. Nowadays, we can think about Internet as an environment for interacting 
process using mobile agents, so we propose to use intelligent mobile agents as 
the primary mechanism for mediate the information from different sources. 

The next section provides background information on the steps involved in 
conflation. Section 3 includes related works on conflation, and distributed and 
mobile computing, representing our problem and its proposed developing envi- 
ronment. In section 4, we address some issues related with map conflation which 
describe our approach to the rubber-sheeting process. We conclude in section 5 
with the summary and the future of this work. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 693-699, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



694 



Frederick E. Petry and Maria J. Somodevilla 



2 Background 

Conflation is an iterative, multi-step process that involves rubber-sheeting, iden- 
tification of matching features, and positional and attribute deconfiiction from 
different datasets, which will be describe in the next sections. 

2.1 Rubber-Sheeting 

The first step in map conflation is a process commonly known as rubber-sheeting. 
This process takes its name from the logical analogy of that of stretching a piece 
of rubber to fit over some object. In this technique coordinates from two maps 
are iteratively brought into alignment or registered each other. First of all, it is 
necessary to define the sets of points contained in the base or reference map, and 
their corresponding points in a second map to be aligned to. The second map is 
referred to as the rubber-sheeted map. 




Figure 1: An example of triangulation and rubber-sheeting 

Traditionally, the map having best overall accuracy is considered as the base 
map. Another approach uses the most accurate points from the two maps, so it 
will be necessary for both maps to be stretched in the rubber-sheeting process. 
Rubber-sheeting is achieved by the completion of two phases, triangulation and 
rubber-sheet transformation. Delaunay triangulation is used to divide the entire 
map into triangular-shaped regions. It is considered by some experts [5] to be the 
best triangulation for rubber-sheeting. This triangulation avoids the formation 
of long and thin triangles which do not lead to better results in rubber-sheeting. 
In addition, it produces an unique triangulation regardless the order in which the 
points were entered. This property is very suitable for GIS applications where 
updates are performed very often. After completion of base map triangulation. 



Modeling Issues for Rubber-Sheeting Process 



695 



the rubber-sheet transformation has to be applied with the purpose of obtaining 
the same triangulation in the rubber-sheeted map. For each triangle formed in 
the rubber-sheeted map, transformation coefficients for each pair of equivalents 
triangles are calculated. These coefficients, determining scaling, rotation and 
translation, are used to geo-rectify non-matched points. 

Similar triangulations for base map (solid lines) , and for rubber-sheeted map 
(dashed lines) were obtained as shown in figure 1. Lighter points are the original 
rubber-sheet points, while darker points are the new stretched points. Notice that 
the northernmost original rubber-sheet point had to be displaced in a southwest- 
ward direction. This displacement is due to the fact that the far left corner of the 
rubber-sheet enclosing triangle must be stretched southwest-ward to align with 
the corresponding triangle corner of the base triangulation. The displacement of 
the other rubber-sheet points can be justify by similar arguments. 



2.2 Feature Matching 

Feature matching involves the identification of features from different maps as 
being representations of the same geographic entity. This step has to guarantee 
to, points which are identical between the maps in order to create corresponding 
triangles in the rubber-sheet map. 

Strong matching criteria [7] need to be defined, in automated conflation sys- 
tems, to achieve accurate positional realignment of coordinates, so accurate fea- 
ture matching results are imperative for rubber-sheeting. Properties related to 
distance measures, topology, geometry, attribute similarity, and graph properties 
are commonly considered in defining match criteria. 

2.3 Feature Deconfliction 

Feature deconfliction is concerning with resolving features inconsistencies to pro- 
duce a single best cartographic and logical representation. This technique is used 
to determine whether the rubber-sheeting process has sufficiently aligned the two 
maps and whether features have been matched correctly. 

According to matching theory [9] the most common errors in matching are: 
(1) a map feature is identified as having a match either when in fact it does 
not, or when its matching feature is incorrectly selected, and (2) a map feature 
is identified as not having a match when in fact it does. 



3 Related Works 

The algorithm in section 3.1 showed a higher spatial accuracy in the coordinates’ 
realignment process than their predecessors. Internet’s applications in section 3.2 
exhibit the advantages of using 00, distribution and parallelism concepts. 



696 Frederick E. Petry and Maria J. Somodevilla 



3.1 Triangulation and Rubber-Sheeting 

Traditionally, a “better map” is chosen by experts as a “base map” and the oth- 
ers are considered the rubber-sheeted maps. An automated conflation of data 
using Delaunay Triangulation with an improved algorithm to choose the points 
to triangulate has been developed in [10]. In this algorithm, for each pair of 
points that represent the same geographic location, a weighted accuracy is cal- 
culated. A smaller circular standard error is desirable for this weighted average 
of two map data points, so the point with the smaller circular error is given the 
greater weight. Using these weights, the new weighted x and y average values 
are obtained and used as the coordinates of the new point in the triangulation. 
Both maps are then stretched into alignment, which showed experimentally much 
better results in positional realignment accuracy. 



3.2 Prom Distributed to Mobile Computing 

Networks connecting high-performance machines and workstations [4], offer in- 
frastructures that can solve complex distributed problems. There exist some 
projects [3] based on this concept of parallel virtual machine. 

Distributed objects, like the projects cited above, allow the representation 
and manipulation of real world objects geographically distributed but they are 
mostly static. Distributed mobile agents add autonomy and agency to distributed 
objects, that means their behavior is proactive and adaptative. This new con- 
ception of distributed systems [1] allows agents to be sent far and wide across 
the network populating users’ machines with new functionality providing great 
flexibility for building applications in the Internet. 

The underlying objective of our research is to explore the distributed as- 
pects on conflation. We think that mobile agent technology is a candidate for 
implementing an improved solution to the modeling and execution of the rubber- 
sheeting process. 

4 Issues in Modeling Rubber. Sheeting 

To model rubber-sheeting in such an unifled framework, allows us to explore 
new ways to represent, and manipulate space and spatial relationships needed 
for efficiently conflating spatial data from multiple sources. We also suggest how 
intelligent mobile agents may be used to mediate information from these sources. 



4.1 Object-Oriented Issues 

Spatial data are very complex [8] due to the fact of the value of an spatial at- 
tribute depend on the position of its geographic entity. Spatial data have geomet- 
ric and topological relationships. Geometry can be related to the shape and size 
of the objects and topology to adjacency, containment and overlapping among 
others. 



Modeling Issues for Rubber-Sheeting Process 



697 



Thus, a point object can represent a geometric map location given by its 
coordinates. Likewise, a triangle object, would be an ordered list of three points 
for representing the triangulated space. Finally, the class map, which would 
contain map objects may be the generalization of triangle and point classes. The 
topological relationships can be captured through methods. For instance, it will 
be necessary to define a method to determine the enclosing triangle of a point 
to be rubber-sheeted. 



4.2 Distribution Issues 

In this section we will address several significant issues related to distribution like 
distribution- type, fragmentation, replication, and migration [6]. Centralized or 
distributed type of distribution is concerning with that a distributed distribution 
scheme is more robust allowing a better balancing of the loads at the expense 
of higher computational costs. Distributing maps or regions will depend on the 
number of points to be realigned. To achieve an efficient distributed model, the 
decision about which objects to distribute should be make dynamically. 

Horizontal fragmentation may be enough considering that regions and maps 
can have the same attributes. Vertical fragmentation is expensive because it 
requires the creation of new classes. Replication of neighboring triangles might 
be necessary for local optimization in triangulation, that is for obtaining a more 
uniform triangulation. Also, replication might be useful when the triangles are 
rubber-sheeted in order to decrease communication costs. 

Migration of objects might be applied for balancing loads among processors, 
when triangulation becomes finer for instance. Also, after coordinates’ values 
adjustments, the new stretched points can belong to a different region, and then 
imperative to migrate these points. 

4.3 Parallelism Issues 

Performance in parallelism is measured by speed up and efficiency [2] . The factors 
affecting the performance are data partition, load balancing and communication 
costs. Data partition depends on data topology, i.e. regularity and homogeneity 
of the space. Spatial data in general are not regular and homogeneous, then 
dynamic methods for data partition are required. While point operations are 
straightforward, neighborhood or global operations require data duplication or 
interprocess communication. 

Static and dynamic load balancing schema are applicable in GIS but the com- 
bination of both can improve the general performance. For instance, it would be 
reasonable to begin the process using an equal area division (i.e. one map per 
processor). At the end of the maps’ triangulation the loads can be dynamically 
distributed according the points to be rubber-sheeted. Communications are re- 
quired for the three steps of conflation. Choosing the more appropriate way of 
partitioning the data is imperative to reduce the communications costs. 



698 



Frederick E. Petry and Maria J. Somodevilla 



4.4 Mobile- Agent Issues 

00 is a natural way of designing mobile agents since encapsulation is compatible 
with their self-contained nature. Distribution issues can be addressed through 
mobility, and parallelism is inherent to them since multiple agents can be dis- 
patched for searching at the same time. 

Conflation is a well defined complex problem in CIS, then we can sepa- 
rate conflation in pieces which can be coordinated using agents. The next step 
would be choosing the types of agents for doing conflation. Appropriate agent’s 
types can be geometry, topology, rubber-sheeting, feature matching agents, and 
a manager agent to coordinate collaborative efforts among the previous working 
agents. Since different coordination models [1] fit for different applications it is 
needed to choosee the best one for conflation. 

In general, conflation agents can detect multiple features representations and 
implement conflict resolution strategies. For instance, multiple agents can be 
dispatched to search through different maps to identify candidate matching fea- 
tures, then rubber-sheeting and feature matching agents would cooperate with 
the other agents to complete the conflation process. 

5 Conclusions 

This project of modeling rubber-sheeting issues is motivated on previous works 
in conflation, and distributed and mobile computing. We based on an improved 
rubber-sheeting algorithm, due to its higher spatial accuracy in the realignment 
of coordinates from two maps. The distributed and mobile computing technology 
have showed improvements in developing complex applications by using 00, 
distribution and parallelism concepts. The research is currently in its phase of 
design through an assessment of the modeling issues for rubber-sheeting. The 
next steps, will use these critical issues needed for prototype development. 



References 

1. Cabri et al. Mobile- Agent Coordination Models for Internet Applications. Com- 
puter^ February 2000:82-89. 696, 698 

2. Ding Y., and Densham P.. Spatial Strategies for Parallel Spatial Modeling. Int. J 
of GI Science, Vol. 10, No. 6, 1996:669-698. 697 

3. Ferrari A.. JPVM. The Java Home Page, Feb. 1999. www.cs.virginia.edu/jpvm. 
696 

4. Fox et al.. High Performance Commodity Computing on Top of Integrated Java, 
CORBA, DCOM and Web Stands. Euro-Par’98 Parallel Proc. Conf.:55-79. 696 

5. Gillman D.. Triangulations for Rubber Sheeting. In Proc., Auto-Carto 7, Palis 
Church, VA: ACSM/ASP, 1985. 694 

6. Ozsu M. and Valduriez P.. Principles of Distributed Databases Systems. Prentice 
Hall, 1991. 697 

7. Rosen B.,and Saalfed A. Match Criteria for Automatic Alignment. In Proc., Auto- 
Carto 7, Palis Church, VA: ACSM/ASP, 1985. 695 



Modeling Issues for Rubber-Sheeting Process 



699 



8. Rumbaugh et ah. Object Oriented Modeling and Design. Prentice Hall, 1991. 696 

9. Saalfeld A.. Conflation: Automated Map Compilation. Int. J of GI Science^ 
2(3):217-228, 1988. 695 

10. Wilson R.. Automatic Conflation of Vector Product Format Data Using Delaunay 
Triangulation with a Modified Algorithm to Choose the Points to Triangulate. MS 
Thesis, McNease State University, 1997. 696 



Reasoning and Belief Revision in an Agent for 
Emergent Process Management 

John Debenham 

University of Technology, Sydney, PO Box 123, NSW 2007, Australia 
debenham@socs . uts . edu . au 

Abstract. An agent architecture is designed to support emergent business 
process management. The conceptual agent architecture is a three-layer BDI, 
hybrid architecture. As a hybrid architecture it balances proactive and reactive 
reasoning. Multi-agent systems for emergent process management can generate 
a substantial amount of inter-agent communication that can lead to non-trivial 
belief revision. The architecture has been trialed on emergent process 
management in a university administrative context. 



1 Introduction 

An intelligent multi-agent system is a society of autonomous cooperating components 
each of which maintains an ongoing interaction with its environment. Intelligent 
agents should be autonomous, cooperative and adaptive. The process agent 
architecture is designed specifically for emergent process applications. The term 
‘agent’ has a wide range of meaning [1] in the research literature. The term ‘agent’ is 
used here, following [2] in the sense that „an agent is a computer system, situated in 
some environment, that is capable of flexible autonomous action in order to meet its 
design objectives“ and „the term ‘multi-agent systems’ ... is now used to refer to all 
types of systems composed of multiple (semi-) autonomous components“. The work 
described here focuses on the reasoning and belief revision of a intelligent multi-agent 
system for emergent process management. 

Process management is an established application area for multi-agent 
systems [3] [4]. The term emergent process is taken here to refer to processes that are 
not pre-defmed, that are usually not of a routine nature and that may rely on some 
level of initiative from the system to bring them to a conclusion [5] [6]. One valuable 
feature of process management as an application area is that ‘real’ experiments may 
be performed with the cooperation of local administrators; a system for postgraduate 
enrolment [7] was trialed in this way. The process agent architecture has been trialed 
on emergent process applications within university administration. 

2 Agent Architecture 

A variety of architectures have been described for autonomous agents [1]. A 
fundamental distinction is the extent to which an agent architecture exhibits 
deliberative (feed forward, planning) reasoning and reactive (feed back) reasoning. If 
an architecture combines these two forms of reasoning it is a hybrid architecture. 
One well reported class of hybrid architectures is the three-layer, BDI agent 
architectures. One member of this class is the InteRRaP architecture [8], which has 
its origins in the work of [9]. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, ENAI 1821, pp. 699-705, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




700 John Debenham 



The process agent architecture described here is similar to the InteRRaP 
architecture; it is based on a set of basic concepts or „mental categories^ [8]. These 
concepts are: beliefs (represented in a social model, self-model and world model), 
goals (categorised as: cooperative goals, local goals and procedures), triggers 
(categorised as: cooperative triggers, local triggers and procedure triggers),plans and 
intentions (categorised as: cooperative intentions, local intentions and procedural 
intentions). The process agent architecture consists of a conceptual architecture and a 
control architecture [op.cit.]. The conceptual architecture describes how the agents 
are specified. The control architecture describes how the agents operate. 

The conceptual architecture differs from the InteRRaP conceptual 
architecture [8] which is applied to the design of „forklift agents“ which are used in a 
loading dock case study. The forklift application is safety critical; emergent process 
applications are not so. However in both systems things can „go wrong“; a powerful 
agent architecture is particularly suited to such applications. In the conceptual 
architecture of the process agent, world beliefs are derived either from reading 
messages received from a user, or from reading the documents involved in the 
process instance, or from reading messages received from other agents. These 
activities are fundamentally different. Documents are „passive“ in that they are read 
only when information is required. Users and other agents are „active“ in that they 
send messages when they feel like it. Beliefs play two roles. First, they can be partly 
or wholly responsible for the agent committing to a goal, and may thus initiate an 
intention (eg. a plan to achieve what a message asks, such as „please do xyz“). This 
is an example of deliberative reasoning. Second, they can be partly or wholly 
responsible for the activation of a trigger that will directly effect the execution of an 
active plan. This is an example of reactive reasoning. 

The deliberative reasoning mechanism employs the non-deterministic 
procedure: „on the basis of current beliefs — identify the current options, on the basis 
of current options and existing commitments — select the current commitments (or 
goals), for each newly-committed goal choose a plan for that goal, from the selected 
plans choose a consistent set of things to do next (called the agent’s intentions^. If 
the current options do not include a current commitment then that commitment is 
dropped. The reactive reasoning mechanism employs triggers that observe the 
agent’s beliefs. If those triggers fire then they take precedence over the agent’s 
deliberative reasoning. 

An early step in the design process for a multi-agent system is determining 
the system organisation. In the application described here, the system organisation 
consists of one process agent for each (human) user. There are no other agents in the 
system. 

The role of a web-based process management system [6] in its direct 
communication with each user is: 

• to manage the user’s web-based Tn Tray’, 

• to clear the user’s ‘Out Tray’, 

• to manage the storage and retrieval of documents, and 

• to communicate with the user. 

If a process agent wishes to communicate with its user then it may do so either by 
modifying the user’s ‘In Tray’ or by sending the user a message — ^perhaps by email. 
A user communicates with their agent by placing a document in the user’s ‘Out Tray’; 




Reasoning and Belief Revision in an Agent for Emergent Process Management 701 



this may, for example, be achieved by ‘clicking’ a software ‘button’ on that 
document. If a document is placed in the user’s ‘Out Tray’ then this will be realised 
as one of the agent’s incoming messages, and so it may then become a belief In 
addition to its direct communication with users the multi-agent system manages the 
documents, performs automatic document checking and so on. 

Inter-agent communication uses KQML (Knowledge Query and 
Manipulation Language) as a wrapper language [10]. Each process agent has a 
message area. If agent A wishes to tell something to agent B then it does so by 
posting a message to agent B’s message area. Each agent has a message manager 
whose role is to look after that agent’s message area. Each message contains an 
instruction for the message manager. Two such instructions are: 

• post message and remove on condition — ^the sender is asking the receiving agent’s 
message manager to display the message in the receiving agent’s message area 
until the stated condition is satisfied, and 

• remove message — the sender is asking the receiving agent’s message manager to 
remove an existing message from the receiving agent’s message area. 



3 Reasoning 

As a hybrid architecture the process agent architecture exhibits both deliberative and 
reactive reasoning [1]. Deliberative reasoning is managed within a goal-plan- 
intention framework [9]. Reactive reasoning is effected with triggers. 



3.1 Deliberative Reasoning 

The process agent employs a form of „plan“ that is rather more elaborate than many 
types of agent plan [9]. In emergent process management process instances are prone 
to local failure. An agent architecture that can deal naturally with failure is described 
in[op. cit.]. Plans are built there from single-entry, triple-exit blocks; where the three 
exists represent success, failure and abort. Powerful though that approach is, it is 
insufficient for emergent processes because whether a plan has executed successfully 
is not necessarily related to whether that plan’s goal has been achieved. 

A plan can not necessarily be relied upon to achieve its goal even if all of the 
sub-goal son a chosen path through the plan have been achieved. On the other hand, 
if a plan has failed to execute then it is possible that the plan’s goal may still have 
been achieved. So, in emergent process application a necessary sub-goal in every 
plan body is a sub-goal called the „success condition“. The success condition (SC) is 
a procedure whose goal is to determine whether the plan’s goal has been achieved. 
The success condition is the final sub-goal on every path through a plan. The success 
condition is a procedure; the execution of that procedure may succeed (3), fail (7)or 
abort (A). If the execution of the success condition fails then the overall success of 
the plan is unknown (?). So the four possible plan exits resulting from an attempt to 
execute a plan are as shown in Fig. 1. 




702 John Debenham 




The deliberative frequency is the frequency at which the deliberative process 
is activated. This process determines current options, selects current goals, and so on 
as described in the agent control architecture above. The deliberative frequency 
should be short enough to keep the system moving but not so fast that an 
individual’ s„In Tray“ is seen to be constantly changing. For emergent process 
management a deliberative frequency in the region of 10 — 60 minutes seems 
appropriate. 

3.2 Reactive Reasoning 

Reactive reasoning play three roles: first, a plan is aborted if its specified abort 

condition is satisfied, second, data is passed to partly executed plans for goals an 
agent is committed to achieve, and third, urgent messages are dealt with. Of these 
two roles the first takes precedence over the second. For example, the third role for 
reactive triggers handles messages of the form „stop what you are doing and do this“; 
this third role has yet to be implemented. 

Each plan contains an optional abort condition [ab]. These abort conditions 
are realised as reactive triggers. These triggers scan the agent’s beliefs for the 
presence or absence of specific conditions. For example, „z/I do not believe that X 
wants Y then abort the plan whose goal is to deliver Y to X“ is an example of a 
reactive trigger. Reactive triggers are only active if the agent is presently committed 
to achieve the goal of the plan to which they are attached. If a plan is aborted then 
any plans for active sub-goals of that plan are also aborted. 

Data is passed to partly executed plans using reactive triggers. For example, 
the goal of the plan illustrated in Fig. 2 is „X’s opinion on Y has been obtained^. The 
plan for that goal has one sub-goal „X’s opinion on Y has been requested^ and 
another sub-goal„X’s opinion on Y is Z“. This second sub-goal may be achieved if 
„X’s opinion on Y is Z“ is present in the agent’s world beliefs. So until such a belief 
materialises an attempt to achieve this second sub-goal may „hang“. This situation is 
managed by linking the second sub-goal to a reactive trigger ,,I believe that: X’s 
opinion on Y is Z“. This reactive trigger „watches“ the agent’s world beliefs. If and 
when this reactive trigger fires the second sub-goal is instantiated and is achieved. 
Reactive triggers of this form are associated with sub-goals. These triggers are 




Reasoning and Belief Revision in an Agent for Emergent Process Management 703 




Fig. 2. Segment of a plan and a reactivetrigger 

activated when their associated sub-goal is committed to but has not been achieved. 
Triggers of this form provide a mechanism for passing data to such sub-goals. 

The reactive frequency is the frequency at which an attempt is made to fire 
all active abort triggers and reactive triggers. The reactive frequency should be fast 
enough to avoid delays but not so fast that the agent is repeatedly scanning a set of 
seldom changing beliefs. There active frequency is set at the order of one minute for 
emergent process management. So the reactive frequency is shorter than the 
deliberative frequency for emergent process management. The abort triggers have a 
higher priority than reactive triggers. So if a plan’s abort trigger fires and if an active 
sub-goal in that plan is the subject of a reactive trigger then that sub-goal will be 
deactivated so preventing that reactive trigger from firing even if the required belief is 
in the world beliefs. 



4 Belief Revision 

In the multi-agent system organisation described here each (human) user works with 
an agent. In emergent process applications, that form of organisation leads to a 
significant amount of inter-agent communication, which in turn generates a 
significant number of world beliefs in each agent for each process instance. The 
management of these beliefs is now discussed. In general emergent process 
management involves a wide variety of different types of inter-agent communication, 
and so leads to complex belief revision. Two types of communication are: 

• a request for an agent to do something that may progress a process instance, and 

• a request for an agent to modify its priorities between its process instances. 

The management of belief revision for these two types is managed using the 
protocol that the sender of a message is responsible for the life of that message unless 
that responsibility is delegated. 

When a message arrives at an agent A it is „stamped“ and then it is „posted“ 
on the agent’s message area. The message manager stamps the message with: the 
time of arrival and with the number of goals to which the agent has committed that 




704 John Debenham 



have been triggered by the presence of that message in the message area; for a newly 
arrived message this number will be zero. So a newly arrived message is stamped: 
time received ( <time> ) 
no . goals ( 0 ) 

„Stamping a message^ is rather like the way that letters are sometimes stamped in an 
office „this letter was received on 4 July and has yet to be dealt with“. The message 
„stamp“ becomes part of the message; the stamp is principally for the use of the 
message manager. An agent can „read“ the messages in its message area. A message 
in an agent’s message area may lead to that agent forming a belief derived from the 
contents of that message. 

Each process instance has a unique identifier. When the external transaction 
that created a particular process instance has been finally resolved, a number of 
beliefs related to that instance may remain. A garbage collection process removes 
these defunct beliefs. 



5 Conclusion 

The process agent described above is implemented in Java. It is implemented as an 
interpreter of high-level agent specifications. This interpreter enables agents to be 
built quickly and maintained easily. The agent’s reasoning incorporates deliberative 
and reactive reasoning. Belief revision is managed in part by making the 
responsibility for the life of a message both explicit and transferable. 



References 

1. Weiss, G. (ed) (1999). Multi-Agent Systems. The MIT Press: Cambridge, MA. 

2. Jennings, N.R., Syeara, K. and Wooldridge, M.J. (1998). A Roadmap of Agent Researeh 
and Development. In Autonomous Agents and Multi- Agent Systems, 1, 7-38(1998) 
Kluwer Aeademie Publishers. 

3. Merz, M., Lieberman, B. and Lamersdorf, W. (1997). Using mobile agents to support 
inter-organizational workflow management. Applied Artifieial Intelligenee, vol. 1 1(6) pp. 
551-572, 1997. 

4. Huhns, M.N. and Singh, M.P. (1998). Managing heterogeneous transaetion workflows 
with eooperating agents. In N.R. Jennings and M. Wooldridge, (eds). Agent Teehnology: 
Foundations, Applieations and Markets. Springer- Verlag: Berlin, Germany, 1998, pp. 
219—239. 

5. O’Brien, P.D. and Wiegand, M.E. (1997). Agents of Change in Business Proeess 
Management. In H.S. Nwana & N. Azarmi (Eds.) Software Agents and Soft Computing: 
Towards Enhaneing Maehine Intelligenee. Springer-Verlag(1997). 

6. Norman, T.J., Jennings, N.R., Faratin, P., Mamdani, E.H. (1997). Designing and 
Implementing a Multi-Agent Arehiteeture for Business Proeess Management. In J.P 
Muller, M.J. Wooldridge & N.R. Jennings (Eds). Intelligent Agents III. Springer- Verlag 
(1997). 

7. Debenham, J.K.(1999). A Multi- Agent System for Emergent Proeess Management. In 
proeeedings Nineteenth International Conferenee on Knowledge Based Systems and 
Applied Artifieial Intelligenee, ES’99, Cambridge UK, Deeember 1999. 




Reasoning and Belief Revision in an Agent for Emergent Process Management 705 



8. Muller, J.P.(1997). The Design of Intelligent Agents: A Layered Approach (Lecture 
Notes in Computer Science, 1177). (May 1997), Springer Verlag 

9. Rao, A.S. and Georgeff, M.P. (1995). BDI Agents: From Theory to Practice. In Proc 1st 
Int Conf on Multi-Agent Systems (ICMAS-95), San Francisco, USA, pp 312 — 319 

10. Finin, F. Labrou, Y., and Mayfield, J. (1997). KQML as an agent communication 
language. In Jeff Bradshaw (Ed.) Software Agents. MIT Press (1997). 




System Design and Control Framework for an 
Autonomous Mobile Robot Application on Predefined 
Ferromagnetic Surfaces 



Mahmut Fettahlioglu* and Ay din Ersak^ 



' TUBITAK-BILTEN, METU, 06531 Ankara, Turkey 
mahmut@bilten . metu . edu . tr 
^ METU, EEE Dept., 06531 Ankara, Turkey 
ayersak@metu . edu . tr 



Abstract. Maintenanee tasks of ferromagnetie surfaees present a suitable field 
for robotie applieations. A system and eontroller software design satisfying the 
system requirements of an autonomous robotie system moving on a ferromag- 
netie surfaee aeeomplishing several tasks sueh as eraek deteetion, welding, 
painting and emergeney reeovery aetions is presented. The eontroller software 
developed here is a hierarehieal eontrol system based on the RCS Referenee 
Model Arehiteeture. The software shell and seleeted developed modules of the 
eontroller are presented. 



1 Introduction 

Maintenance tasks need to be applied to structures made of ferromagnetic metal (EM) 
surfaces (e.g. ships) during both their initial building period and later maintenance 
stages. Surfaces of metallic structures are composed of EM panels placed consecu- 
tively on the structure skeleton. Some of the common operations performed on struc- 
ture surfaces for maintenance purposes are crack detection, welding, painting, debur- 
ring, and finding the location of and replacing faulty panels. Today, these operations 
are mostly conducted manually, by human workers. As such, the tasks present a dan- 
ger to the workers' health, because of the falling risk, use of welding torches and poi- 
sonous paints. These tasks are also physically strenuous and tiring. Such characteris- 
tics show that these tasks present a suitable field for robotic applications. 

Many research works have been presented in the field of autonomous mobile vehi- 
cles. Among these, the ones that have used RCS, a reference model architecture for 
building hierarchical controllers, are of special interest as RCS has also been used in 
this work. Coal mining automation using a remote unmanned vehicle has been investi- 
gated [1]. The achievements performed in nuclear submarine automation are ex- 
plained in [2] and [3]. The application of RCS in multiple autonomous undersea vehi- 
cles is discussed [4]. A vehicle that autonomously drives itself, following the lane and 
avoiding other vehicles is presented [5]. The remote operation of unmanned land 
vehicles, either by teleoperation, or retracing a previously recorded path, is investi- 
gated [6]. 

The robotic system desired for EM surface maintenance is an autonomous system 
requiring minimal operator assistance. It needs to decide and react in real-time. It is 

R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 705-710, 2000. 

© Springer- Verlag Berlin Heidelberg 2000 




706 Mahmut Fettahlioglu and Aydin Ersak 



comprised of a lightweight robot that attaches and moves along the structure surface 
by utilizing magnetic power, driven by stepping motors. A separate tool specific to 
task type is used. A manipulator is required to move and direct the tool. A toolbox 
capable of holding several types of tools is also desirable. A battery placed on board 
backs up the external power supply of the robot. In addition, another robot is used to 
place the robot on the surface prior to operation and to pick it up when operation ends 
or in emergency cases such as the main supply power cable break. 

To reduce the system complexity, the mounting of a toolbox on the worker robot 
body, and the winch-robot, are not provisioned within the initial prototype, although 
they are considered as features that may be incorporated in the future. 

The robotic system should be able to control the operation of the robot(s) in real- 
time requirements. It is assumed to make its own decisions autonomously, requiring 
minimal operator assistance e.g. taking high level commands. Such an autonomously 
deciding and acting system will need to have a high degree of intelligence [7] . Thus, 
the system may be defined as an autonomous system and an intelligent system. 

In addition to performing its tasks in the correct way, the system is also required to 
respond to emergency events immediately. When such emergency conditions as 
worker robot main power supply failure, and humans or unknown objects entering into 
the workspace occur, the system needs to make decisions on recovery actions that are 
either predefined or generated online, and execute these actions in time. 



2 System Design 

The system needs to know some information about the working environment and the 
robot a-priori. This consist of the surface description, including irregularities and 
special features, and the sensor and actuator locations on the robots. 

The worker robot has three rubber coated wheels driven by stepping motors, a 5 
DOF manipulator, tools specific to different tasks, and an adjustable airgap distance 
electromagnet as actuators. It has a fixed camera observing the end-effector and its 
target, a number of sonar sensors to identify objects on the surface, and odometry to 
detect slippage as sensor systems. An on-board computer in WR performs local com- 
putations, and communication with the host system. 

The laser beacon sensory system consists of three laser beacons (transmitters) with 
pan/tilt capability and three receivers mounted on targets (surface and robot). The 
position of targets are determined using triangulation. This system acts as the main 
sensory system that determines the position and orientation of the surface and the 
position of the robot. It is used at startup to create the global world coordinate system, 
and is continued to be used later as the algorithm used is simple and fast. 

Two cameras with pan and tilt capability externally observe the work area. The 
system, other than providing redundancy in position determination, helps in finding 
the orientation of the robot, and also has the crucial task of detecting humans or other 
objects in the workspace. 

A host computer runs the master controller software, merging information from ah 
the sensors and deciding which action to take. It is decided to use another computer to 
perform image processing in parallel. 




System Design and Control Framework for an Autonomous Mobile Robot Applieation 707 



3 Controller Design 



It is clear that software that is expected to control such a sensor and actuator rich sys- 
tem as proposed in the design cannot be implemented efficiently as a classical se- 
quential computer program, while retaining desirable run-time characteristics, and 
maintainability. To overcome these problems, hierarchies may be used to build levels 
of abstraction (resolution and precision). Hierarchies can be functional hierarchies, or 
they can be hierarchies of range and resolution along spatial and/or temporal dimen- 
sions [7]. Use of hierarchical structures is validated to some extent by the fact that 
hierarchical structures capture the essential human techniques of abstraction. [8]. 

Using hierarchical systems enable the implementation of systems having great 
complexity, yet composed of individual nodes of controlled complexity. At no level a 
node has to cope with both broad scope and high level of detail [9]. This both keeps 
the amount of computing resources needed in each node within manageable lim- 
its [10], and improves human understanding of the design, making implementation of 
each node and their hierarchy easier [11]. Software development is also enhanced and 
accelerated as hierarchical layering enables concurrent software development and 
code re-use [12]; software system extensibility, maintainability and alterability are 
enhanced. Another benefit is during software testing, as each node in the hierarchy can 
be singled out and tested against various inputs and outputs [8]. 

Real-time Control System (RCS), is a reference model architecture utilizing a task 
based spatial and temporal hierarchical architecture. RCS has been in active develop- 
ment by NIST since the early 1980s. [11]. Being a hierarchical architecture, RCS has 
all the properties discussed above about hierarchical systems. It defines the functional 
elements that are used to build a node in the hierarchy [13]. In addition, it provides C 
and C++ APIs to communicate between nodes in the hierarchy, which may be located 
in a single machine, or may be distributed between several machines [14]. 

Because of the benefits of using RCS reference model, and its applicability to com- 
plex autonomous systems, RCS reference model architecture is decided to be used as a 
base for the controller software design. After various design iterations and trial-and- 
error steps, a five level hierarchical system as given in Fig. 1 is decided to be the 
structure of the controller software. 

In this hierarchy, the master controller is responsible from high level decision 
making, and command and coordination of lower level systems. 

Located as subordinates to the master controller, there are three systems: the Exter- 
nal Camera System, the Worker Robot System, and the Laser Beacon Positioning 
System. As their names suggest, the External Camera System is responsible from the 
processing and merging/matching of captured images from the external cameras and 
the control of their pan and tilt motors. The Worker Robot system is responsible for 
control and data interpretation of actuators and sensors on the robot, and contains 
Navigation, Manipulator, Camera, and Sonar sub systems. The Laser Beacon Posi- 
tioning system is responsible from the control of beacon motors, and interpretation of 
laser beacon data to find object positions. 




708 Mahmut Fettahlioglu and Aydin Ersak 



OPERATOR INTERFACE 




Ext. Cam1 Ext. Cam2 Wheel motors, Fixed Manipulator n sonar Beacon 1 Beacon 2 Beacon 3 

and motors and motors orientation sensor, camera joints, tool sensors 

magnetic disc Sensors and Actuators 



Fig. 1. Controller Software Hierarehy 

The commands accepted by all nodes, and their behavior in response to these 
commands have been analyzed in detail. 



4 Implementation 

Debian GNU/Linux is selected as the operating system the software is implemented, 
because of its reliability, ease of software development, speed, true multi-tasking and 
bundled tools. RT-Linux is selected for nodes that have hard real-time requirements. 

As a programming language, C++ is selected as the main development language. 
This is both because C++ is a programming language that is widely supported and is 
considered an industry standard, and also C++ classes handling communication be- 
tween different modules are already provided with RCS. 

A software "shell", which consists of all the modules that are shown in Fig. 1 is im- 
plemented. The implementation of communication paths and messages between all 
modules, and node bodies are completed. This software shell provides a basis for 
easily adding control algorithms and logic to each module. 

Motor control is implemented in RT-Linux. The algorithm is integrated into the 
system as the Navigation Servo and Navigation Prim nodes of the controller software 
hierarchy. The EMC (Enhanced Machine Controller) project has been referred as an 
example during implementation [15]. Interface is through a PC parallel port, and in 
order not to damage the port, an isolation circuit is developed. 

The Ext.Cam. Servo and Camera Servo nodes of the control software hierarchy are 
implemented by developing software that performs frame grabbing through a BT878 
chip based video capture card. 









System Design and Control Framework for an Autonomous Mobile Robot Applieation 709 



For the recognition of objects from captured images, several image processing and 
object recognition methods are used. In this phase of the research, the emphasis is on 
the incorporation of image processing nodes into the controller hierarchy. Thus, the 
level of image processing is limited to the recognition of artificially created objects 
with geometric shapes. 

The first step taken in object recognition is finding the edges in the image. Canny 
edge detection is selected as it performs better in noisy images than using Sobel op- 
erators, or LoG operators, and slightly better than Maxgrad edge detection [16]. Only 
edges that form closed shapes with boundaries greater than a threshold are retained. 
These are assumed to represent objects in the current image. 

Object recognition is performed by extracting several features from images and 
matching them with features of models. Only features that are invariant to linear trans- 
formations are considered. Thus, it is decided to use seven moment invariants, ten 
selected Fourier descriptors, and the circularity metric [16-18]. 

A back-propagation neural network is used for classification. The features extracted 
from the training set, and the noisy versions of them are trained to the network. Test 
results show that this technique gives good results under noise. This software is incor- 
porated to the Camera Prim., Camera Subsys., Ext. Cam. Prim., Ext. Cam. Subsys., 
and Ext. Cam. Sys. nodes in Fig. 1, Controller System Hierarchy. 

5 Conclusions 

In this research, system requirements for an autonomous robotic system for mainte- 
nance of inclined FM surfaces is analyzed, a system design is developed, and opera- 
tion behaviors are analyzed. 

A suitable software system in order to control the suggested system is designed. 
The system is a hierarchical system based on the RCS reference model architecture, in 
order to reduce system complexity and make the software system manageable, exten- 
sible and alterable. 

During implementation, a system shell supporting the proposed software architec- 
ture, motor control, capturing images using a frame grabber card, and image process- 
ing to recognize objects in these images are implemented. It has been seen that sepa- 
rately developed programs can be easily incorporated into the hierarchical software 
shell, validating the maintainability and extensibility properties of RCS. 



References 

1. Huang, H. M., Horst, J., Quintero, R., "A Motion Control Algorithm for a Con- 
tinuous Mining Machine Based on A Hierarchical Real-Time Control System 
Design Methodology," Journal of Intelligent Robotic Systems, 1991. 

2. Huang, H. M., "An Architecture and A Methodology for Intelligent Control," 
IEEE Expert, vol. 11, no. 2, pp. 46-55, April 1996. 

3. Huang, H. M., Young, K, Quintero, R, "Submarine Automation: Demonstration 
#5," NISTIR 5676, National Institute of Standards and Technology, 
Gaithersburg, June 1995. 




710 Mahmut F ettahlioglu and Aydin Ersak 



4. Albus, J. S., "An Engineering Architecture for Intelligent Systems," Proceedings 
of the American Association for Artificial Intelligence (AAAI), Fall Symposium 
Series, 1996, Massachusetts Institute of Technology, Cambridge, MA, Novem- 
ber 9-11, 1996. 

5. Juberts, M., Murphy, K., Nashman, M., Scheiderman, H., Scott, H., Szabo, S., 
"Development and Test Results for a Vision-Based Approach to AVCS," 26th 
ISATA Meeting on ATT/IVHS, Aachen, Germany, September 1993. 

6. Szabo, S., Scott, H. A., Murphy, K. N., Legowik, S. A., "Control System Archi- 
tecture for a Remotely Operated Unmanned Land Vehicle," Proceedings of the 
5th IEEE International Symposium on Intelligent Control, Philadelphia, PA, 
September 1990. 

7. Antsaklis, P. J., "Intelligent Control," Encyclopedia of Electrical and Electronics 
Engineering, John Wiley & Sons, Inc., 1997. 

8. Advanced Technology and Research Corp., "The RCS Methodology," The Ad- 
vanced Technology and Research Corp. Web Site, http://www.atrcorp.com, 
1999. 

9. Balakirsky, S. B., Salonish, M. J., Allen, S. D., Messina, E., Salinas, J., "Ad- 
vanced MMI/MP for Demo III XUVs," Proceedings of the SPIE AreoSense 98 
Conference, Orlando, FL, April 13-17, 1998. 

10. Albus, J. S., "Technology Requirement to Implement Improved Situation 
Awareness: Machine Perception," Proceedings of AGARD Conference on Fu- 
ture Aerospace Technology, Paris, April 14-16, 1997. 

11. Huang, H. M., Scott, H., Messina, E., Juberts, M., Quintero, R., "Intelligent 
System Control: A Unified Approach and Applications", Chapter in Gordon and 
Breach International Series in Engineering, Technology and Applied Science, 
Volumes on "Expert Systems Techniques and Applications," 1998. 

12. Nashman, M., Yoshimi, B., Hong, T. H., Rippey, W. G., Herman, M., "A 
Unique Sensor Fusion System for Coordinate Measuring Machine Tasks," SPIE 
Intemafl Symp. on Intelligent Systems & Advc. Manufact. Session: Sensor Fu- 
sion & Decentralized Control in Autonomous Robotic Systems, Pitts.., PA, 
10/97. 

13. Albus, J. S., "The Engineering of Mind," Proceedings of the Fourth International 
Conference on Simulation of Adaptive Behavior: From Animals to Animats 4, 
Cape Cod, MA, September 9 - 13, 1996. 

14. Shackleford, W., "The NML Programmer's Guide (C++ Version)," The Intelli- 
gent Systems Division Web Site, http://isd.cme.nist.org/proj/ rcs lib, 1999. 

15. Proctor, F., "EMC Software," NIST Intelligent Systems Division Web Site, 
http://isd.cme. nist.org/proj/emc/emcsoft.html, 1999. 

16. Gonzalez, R. C., Woods, E. W., "Digital Image Processing," Addison- Wesley 
Publishing Co., September 1993. 

17. Thiel S. U., "The Use of Image Processing Techniques for the Automated De- 
tection of Blue-Green Algae," Ph.D. Thesis, University of Glamorgan, Novem- 
ber 1994. 

18. Image Processing Fundamentals Course, Delft University, 
http://www.ph.tn.tudelft.nl/ Courses/FIP/frames/fip-Contents.html, 1998. 




Intelligent and Self-Adaptive Interface 



Claude Duvallet^, Hadhoum Boukachour^, Alain Garden^ 

^ LIH, Faculte des Sciences et Techniques, 

25, Rue Philippe Lebon, 

76058 Le Havre Cedex, France 
Claude . D uvallet @ univ- lehavr e . fr 
^ LIH, Inst it ut Universitaire de Technologie 
Place Robert Schuman 
76610 Le Havre Cedex, France 
{Hadhoum. Boukachour, Alain. Cardon}@iut. univ- lehavre.fr 



Abstract. The information systems for decision-making must provide 
basic elements to the decision-maker in a synthetic and simple way. So, 
it is necessary to build Computer Human Interface (CHI) adapted to 
different user’s perceptions as well as possible. Therefore, the necessity 
to design the application data processing providing an intelligent and 
self-adaptive CHI seems to be more and more necessary. An essential and 
necessary characteristic of this kind of CHI is the capacity to adapt it to 
the environment and to the user’s behavior, and to permit the addition of 
components without putting back in question the design of this CHI. In 
order to have these characteristics in an application, a modelling using 
intelligent agents seems to be well adapted because it permits to take 
into account the complexe interaction present in the CHI. 



Keywords, multiagents systems, adaptive systems, computer human inter- 
face. 



1 Introduction 

During the last decades, the evolution of computer systems has been increased. 
This evolution especialy concerns CHIs. Whereas the firsts computer systems 
have a single online interface to interact with the user. Now, the computer ap- 
plications are based on a graphic representation of the CHIs (windows, dialog 
boxes,...). This evolution of the CHIs represents an enhancement of the computer 
systems that is important because it allows users to use computers even if they 
are not specialists. It also permits to reduce the time of training for these users. 

The use of computer systems is become widespread these last years. Never- 
theless their use remains many difficulties for people who has not an experience 
in the computer domain. Moreover, every utilization of a system is making itself 
in a different way according to each person. Therefore, it would be appropri- 
ate to have a different CHI for each user category. To answer to these needs, 
our proposed solution is a self-adaptable CHI conception based on a modelling 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 711-716, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




712 



Claude Duvallet et al. 



with agents [6]. This CHI evolution constitutes the next important step of the 
Computer-Human Interactions [1] [3]. 

The CHI should lean henceforth on a separation between the objects of the 
domain and the objects of the interface (cf. fig. 1). To make this, we use the MVC 
model of SmallTalk [2] describes in [1]. The idea is to develop one user-interface 
independent of the domain. It must be especially possible to modify in part or in 
totality the CHI without changing the actions of the underlying application. In 
the setting of self-adaptive and intelligent CHI, these modifications takes place 
to the asks of the user or in an automatic way. 

In this article, we propose a model of self-adaptive CHI based on a modelling 
with agents. We present in section 2 the reasons that brought us to design 
an intelligent CHI. In section 3, we propose the modelling with agents of an 
intelligent CHI that we have achieved. In section 4, we illustrate the application 
of these works through two systems and the implementation choices we have 
done. We conclude this article by the possibility to spread these works and to 
take them into account in industrial applications. 




Fig. 1. A model to design CHI 



2 Why an Intelligent CHI? 

In the introduction, we have given some reasons of the necessity to design in- 
telligent and self- adapt able CHI. A first one resides in noting inefficiency of the 
classical CHIs facing the new needs of applications that must allow users to have 
the best comprehension of the computer system. A good design of a CHI must 
permit to get the CHI answering to the specific needs of each user. So, it seems 
adequate that a CHI would adjusts itself to the user’s behavior. 

Besides, the maintenance of the classical CHIs require programmer’s inter- 
vention to make again entirely the design of these CHIs. In order to reduce the 





Intelligent and Self-Adaptive Interface 713 



cost of this maintenance, the design of interfaces must be entirely determined 
by the user and must have a broad capacity of evolution. The realization of this 
kind of interface can be done by means of an adaptive CHI based on a modu- 
larity of its components. Indeed, it must be possible to modify the interface : 
add, remove or modify components without changing the core of the computer 
system. 

An appropriate solution for future applications is to design applications data 
processing providing an intelligent and self-adaptive CHI. Our viewpoint is that 
the complexity of interactions inside an intelligent CHI justifies the use of multi- 
agent systems based on the notion of light agents [6]. We must take into account 
both interactions between applications and users, and the modification of the 
CHI induced by the evolution of the data processing. 

So the modelization of intelligent and self-adaptive CHI is done in the setting 
of our work by an agent modelling. 



3 A Model Using Agents 



We achieve in this section, the modelling of an intelligent CHI that can be 
adjusted automatically to each user’s behavior by the means of different or- 
ganizations of agents. These organizations can be gathered in two important 
categories: 

— User agents: they are about the organization of agents allowing each user to 
personalize its CHI directly and of interaction with to the intelligent agent 
means with this one. 

— Interfacing agents: they manage the different elements of the interfacing in 
order to answer to the contextual needs of the application. They are not 
directly in relation with the user. 



3.1 User Agents 

The personalization of the CHI by agents permits to introduce a bigger sup- 
pleness of the CHI. Indeed, it is possible for users to make calls to agents that 
will permit to personalize itself its interface in order to answer to his needs. It 
concerns the designing of CHI that are entirely parametrezible without import 
the user have to make considerable and expensive efforts and lost time. 

The possibility to increase the number of agents and to include some of new 
in order to arrange new way of interaction with the data processing systems 
constitutes one of the essential features that brings us to have recourse to an 
organization of agents. Agents users are the reactive agents that have very little 
knowledge but can make call to interfacing agents that we describe in the next 
paragraph. 




714 Claude Duvallet et al. 




Fig. 2. Modelling an intelligent CHI by multi- agent systems 



3.2 Interfacing Agents 

Interfacing agents, as we have begin to describe, in the beginning of this sec- 
tion, have to manage the whole interface and notably the user’s modelling. We 
distinguish four categories of interfacing agents that we are going to describe in 
the following of this paragraph. 

3.2.1 Agents of Maintenance of the Interface This first organization of 
agents is characterized by agents of the reactive type. Their role limits itself to 
the exploitation and the organization of the different components of interfacing 
to manage the CHI. They are able to add, to modify, or to suppress components 
within the CHI. New components can appear within the system by composition 
of existing components. All components are managed by an agent of mainte- 
nance. The creation of a new component will be make itself by creation of a new 
agent whose structure will be predefined by the way of a generic maintenance 
agent architecture. Agents of maintenance can be brought to collaborate with 
user agents and agents of construction of the interface that we describe in the 
next paragraph. 

3.2.2 Agents of Construction They are cognitive agents that have a knowl- 
edge of the application’s domain permitting them to construct a CHI that corre- 
sponds to the needs of representation of this one. They manage the representa- 
tive elements of results of the application but as the structuring of the different 
elements permitting the Computer Human Interactions. They are charged in a 
global way of the graphic interpretation of the state of the system and the subjec- 
tive representation of this state. Their interaction is essentially present within 





Intelligent and Self-Adaptive Interface 715 



their own organization but it also exists interactions with the organization of 
agents and the organization of interfacing agents. 

3.2.3 Agents of Capture These agents that we qualify agents of capture 
takes into account demands of users dedicated to the computer system. That is 
why they captur elements transmitted through the interface to make elements 
that are comprehensible by the computer system. These agents are cognitive 
because they have the knowledge permitting them to do the previously described 
transformation. This knowledge is acquired by the system at the moment of its 
conception by the definition of an ontology. 

3.2.4 Agents of Follow-Up of the Evolution of the Situation These 
agents that are able to present the follow-up of the application to the user 
permit to alarm the user of the abrupt changes of evolution of the situation. It 
is necessary to remind that we stand for our works in the setting of applications 
conceived for help to the decision. For example, in an application where there 
is exists a watching of a temperature curve, it is sometimes necessary to alarm 
the user at the time of the overtaking of a doorstep. The organization of agents 
of follow-up of the situation permits from a knowledge acquired to conception 
of the system to construct an analysis of the evolution of the situation. 

4 Implementation and Applications 

The application of the model previously described is doing itself within two appli- 
cations: the first concerns the design of an aided based application for electronic 
market management; the second concerns the design of an application for the 
survey of industrial area with high technological risk. We begin in this section 
by the presentation of our choices of implementation that we have done for these 
two applications. 



4.1 Implementation Choices 

The conception of these two applications requires a distributed architecture be- 
cause it is about multi-user applications. These user interactions through differ- 
ent interfaces that are connected to a computer system are based on a Corba 
architecture. The platform of multiagent system development named MadKit 
[5] has been chosen for its suppleness of utilization and its portability on dif- 
ferent operating systems [4]. This platform has been developed in the LIRMN 
laboratory of the university of Montpellier by Olivier Gutknecht with the Java 
language that we will use therefore. 

4.2 Application to the Electronic Trade Management 

In the setting of application of aid to the trade by computer system slant, users 
must be able to arrange a system permitting them to get information in order to 




716 



Claude Duvallet et al. 



inform them to take their decision, either to sell or to buy a product. In the same 
way, the CHI must permit them to make transactions of purchase and sales. One 
other functionality of this CHI is to be able to warn the user when the situation 
undergoes a meaningful change that can interest the user. To take into account 
this last point, it is necessary to arrange the model of the user and its habits. 

4.3 Application to the Survey of Industrial Site to Risks 

The industrial site implantation with high technological risks (chemical factories 
and petrol factories) in urban area gives back necessary the installation of a 
network of alert sirens for the confinement of populations. The trigerring off 
of these sirens must take place following the occurrence of a serious incident. 
In order to permit this detection, it is necessary to arrange a computer system 
permitting the dangerous industrial site survey. This system should give account 
of the situation by the of CHI. This CHI should allow the operator to fear the 
gravity of the situation quickly. It must be able to present and to bring to the 
fore the essential elements of a situation. This application will be design in the 
setting of a project having for objective the installation of a sirens network for 
the city of the Havre in France. 

5 Conclusion and Perspectives 

In this article, we have presented the design of an intelligent and self-adaptive 
CHI based on the agent paradigm. These works are done in a larger setting of 
the conception of decision support systems. We are currently designing these 
applications. Our future works on the CHI will hold into account of a models 
user endowed of capacities of training that permit him a bigger adequacy with 
the personality and the user’s habits. 



References 

1. I. Akoulchina. SAGE: un agent intelligent d’interface pour un media a base de con- 
naissances taxinomiques foctionnant dans Eenvironnement du WEB. PhD thesis, 
Universite de PARIS, Octobre 1998. 712, 712 

2. X. Briffault and G. Sabah. SmallTalk, programmation orientee objet et 
developpement d’ application. Editions Eyrolles, 1996. 712 

3. G. Patry. Contribution a la conception du dialogue Homme-Machine dans les appli- 
cations graphiques interactives de conception technique : Le Systeme GIPSE. PhD 
thesis, Universite de Poitiers, ENSMA, Mars 199. 712 

4. O. Gutknecht and J. Eerber. Madkit: Organizing heterogeneity with groups in a 
platform for multiple multi-agent systems. Technical report, LIRMM, UMR 9928, 
Universite de Montpellier II, 1999. 715 

5. MADKIT. http ://www.lirmm.fr/^gutkneco /madkit. 715 

6. M. Wooldridge and N. Jennings. Agent theories, architectures and language: A 
survey. In M. Wooldridge and N. Jennings, editors, Intelligent Agents, ECAI 1994^ 
volume LNAI 890, pages 1-32. Springer Verlag, 1994. 712, 713 




Agent Architecture: 

Using Java Exceptions in a Nonstandard Way 
and an Object Oriented Approach to Evolution 

of Intelligence 



Cengiz Giinay 



Center for Advanced Computer Studies 
University of Louisiana at Lafayette, Lafayette LA 70504, USA 
CengizQULL . edu 



Abstract. Using Java exceptions as programming constructs in artifi- 
cial intelligence algorithms is presented. Exceptions are originally pro- 
vided for erroneous case handling by the Java language specification. 
Although it has been advised not to use exceptions for purposes other 
than error handling, here we have shown that exceptions can be used in 
certain situations, to increase the expressiveness of the language. 
Secondly, the advantages of incremental program design using object ori- 
ented schemes are given. An agent architecture project is presented with 
the emphasis on the incremental development and reuse of previous mod- 
ules. It is shown that complicated behaviour can be obtained by making 
design decisions which will let generalize the function of created mod- 
ules, allowing them to be reused for future developments and possible 
improvements. 

Keywords: Agent architecture, artificial intelligence, software reuse, 
software exceptions, reinforcement learning, agent communications, 
object-oriented software design. 



1 Introduction 

This paper presents an account of some Java programming techniques used in de- 
veloping software for autonomous agent navigation in a grid world. The project 
initiated with the implementation of the wall following robot described in [1], 
and developed into an multi- agent environment capable of cooperative commu- 
nication and navigation using Q-learning [2] . 

The focus of this paper will be some points reached during the course of 
development of the project, other than of the ones initially expected. These 
points can be summarized as the following: 

— An agent architecture is created with separate interfaces for grid, body and 
mind, trying to imitate the physical world as closely as possible. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 717-722, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 



718 



Cengiz Giinay 



— Java language exceptions are used in an nonstandard manner as program- 
ming constructs, rather than only for error handling. The advantages of the 
approach is supported by examples of simplified algorithms. It is claimed 
that this type of exception usage is convenient in reinforcement procedures. 

— Object oriented properties of language Java is used to maximize reuse of 
previously written procedures during development. An evolutionary devel- 
opment is obtained as the functionality is extended by reusing existing func- 
tionality instead of the improved functionality overriding the previous be- 
havior. This way, incremental development in intelligence has been presented 
and it is claimed to be analogous to the development of intelligence in the 
biological systems. 



2 Interfacing of the Classes Grid, AgentBody and AgentAI 

The interfacing of the components is designed to be realistic. Therefore some 
operations done by using these interfacing restrictions might seem too limiting 
and indirect. But closer inspection of the scheme will reveal that the separation 
of entities in different classes help in developing high level architectures, and 
offers flexibility in the development process. 

The interfaces will be described with the examples in the following sub- 
sections. Specifications about the examples will also be mentioned, such as the 
simplifications obtained by using Java exceptions in an nonstandard way in § 2.1. 



2.1 Example of an Attempt to Move 

Moving an agent is more complicated than visual perception. The scenario 
starts with a user command to the instance of Grid, calling Grid, advance- 
TimeO method which triggers the events. This method will send a request to 
move to all instances of AgentBody’s contained on the Grid by calling their 
AgentBody . requestMoveO method. This method is given in Algorithm 1 where 



Algorithm 1 AgentBody .requestMoveO 

public void requestMoveO 
throws AgentSuccessfulException { 
try { agent . move ( ) ; } 
catch (ObstacleInPathException e) { 
return; // Couldn’t move :( 

} } 



agent is the instance of AgentAI associated with the AgentBody. Here, note that 
the AgentAI .moveO method can result in successful completion or may fail and 
throw either AgentSuccessfulException or ObstacleInPathException. The 
relevant source code from AgentAI .move () is given in Algorithm 2. 



Agent Architecture: Using Java Exceptions in a Nonstandard Way 719 



Algorithm 2 AgentAI .moveO 

public int moveO 
throws AgentSuccessfulException, 
ObstacleInPathException { 
int dir = decideDirectionO ; 
agentBody .move (dir) ; 
return dir; } 



We have reached from the grid world (Grid) to the agent’s “mind” (AgentAI), 
via its “body” (AgentBody) to ask the agent to act for a move. The agent then 
“thinks” (calls AgentAI . decideDirectionO ) and then acts to accomplish its 
decision by again consulting its body (AgentBody). The call to AgentBody. 
moveO method can result in failure or the AgentAI . decideDirectionO can 
result in satisfying the ultimate goal of the agent. In either case respective ex- 
ceptions are thrown to the caller of AgentAI .moveO . Listing relevant parts of 
AgentBody .move 0 in Algorithm 3 completes the definition of the moving mech- 
anism, where the Grid. moveO is called to realize the decision of the agent’s 
mind. 



Algorithm 3 AgentBody .move 0 

public Block move (int direction) 
throws ObstacleInPathException { 

try { grid. move (x, y, direction, ...); } 
catch (ObstacleInPathException e) { 
agent . ouch 0 ; // Hit the wall! 

throw e; //Do nothing, move fails 

} 

. . . ; // Complete moving process 

return . . . ; } 



Simplification obtained by using Java exceptions: AgentBody .move () , 

given in Algorithm 3 will fail when Grid. moveO results in the occurrence of the 
exception ObstacleInPathException. Although AgentBody .moveO will catch 
the exception, it will again throw the same exception to its caller AgentAI . 
moveO, after placing a call to AgentAI . ouch () ^ in the meantime. 

Therefore the function will prematurely end at this point, before reaching the 
post processing statements that should follow a successful move. The exception 
is used in such a way that program execution in linear fashion is not cluttered 
by error checks, yet proper execution is obtained without side effects. 



1 



Pain sensation in the mind. 



720 Cengiz Giinay 



3 Incremental Programming and Code Reuse 




Legend 



Caption row has class names ordered 
from left extending the one on the right. 

I text I Boxed identifiers are method names. 
Same names override the same 
method signature, but a reverse 
arrow indicates making use of the 
predecessor method. 

^ ^ Diamonds indicate AgentBody 

compliant methods that are called 
externally. 

\ Double lined arrows show 

/ overloading methods. 

Single lined arrows indicate 
one method calling another. 



Fig. 1. Agent AI class extension hierarchy and method reuse. 



This project is the product of an incremental effort. It starts with a generic 
agent definition and slowly builds a complicated intelligent behavior out of prim- 
itive functions. 

The design decisions at each step of introducing new agent capabilities have 
been carefully made, to keep an incremental design schema. This incremental 
development principle will be briefiy presented by the class and method extension 
graph, given in Figure 1. It will also be claimed that, other than producing 
concise and robust code, this incremental approach is analogous to the biological 
evolution of complicated intelligent behavior. 

Brief explanations for each agent architecture, defined by the respective class 
in the figure, follow. 

Agent AI: The abstract base class for the agent’s intelligence. Primarily it 
serves as defining the interface of the agent’s mind with its body. It also contains 
basic code that is used by all agents. The constructor is called at the time 
of instantiation, AgentAI .bringToLif e() is called at the time of the agent’s 
placement on the grid world, AgentAI .move () (see Algorithm 2) is called by 
AgentBody when it is time to move, and AgentAI . ouch () is called when the 



Agent Architecture: Using Java Exceptions in a Nonstandard Way 721 



agent decides wrong. Except for the AgentAI . decideDirectionO method, all 
these base methods are still called even by the most evolved agent, by calling 
the antecedent object’s method explicitly if hidden by the overriding method. 

DumbAgent: This agent’s goal is to find a single food block placed in the real 
grid world. To achieve this goal, it randomly chooses directions until successful. 

□If actoryAgent: Using same concepts, but always heading towards food, 
as if obtaining the direction by smell. Done by adding necessary capabilities to 
Grid and AgentBody. There is still a chance that the agent will determine its 
course by random decision (to avoid getting stuck). 

Recaller Agent: Making use of the step count of the Block objects, it will 
head towards the least visited block out of its possible directions. One property 
to note is that it is using the same methods as the Olf actoryAgent for going 
to corners. When the decision is not one of the immediate four main directions, 
it uses Olf actoryAgent . orientedDirectionO , where the general direction of 
the goal is evaluated and a decision for a direction is returned for the current 
move. 

Explorer Agent: Its aim is to explore the whole grid. It generates an iconic 
map (the mental grid) from the information it gets by its visual perception. It 
uses Olf actoryAgent . orientedDirectionO to head towards some reachable 
unexplored blocks that it can remember. Note that operation is much more 
complicated, but it will make the the following architectures extended from this 
class much simpler as seen from the figure. 

LearnerAgent: Uses the Q-learning algorithm, a form of reinforcement learn- 
ing taken from [2]. This method is used only for deciding the path towards a 
already known goal location. Note that LearnerAgent uses the same decide- 
DirectionO method as Explorer Agent. Therefore, only overloading the 
ExplorerAgent . towardsAGoalO method will be sufficient for implementation. 
This agent, has some additional tasks different from the food seeking agent 
evolved until this point. It first explores the grid world, then starts to col- 
lect balls and return them to deposit bins. Therefore the goals can be unex- 
plored blocks for the first phase, and then ball or deposit bin blocks for the next 
phase. This distinctive functionality is easily obtained by only overloading the 
ExplorerAgent . isFavorableO method, whose sole purpose is to judge if the 
given position on the grid is eligible as a goal location. For overall control of the 
different mind sets of the agent, the ExplorerAgent .move () method has also 
been overloaded. 

TalkingAgent: Only difference of this agent is the capability to communi- 
cate with its own kind existing on the same grid. The content of the message 
passed between agents is limited to their visual field of eight blocks. There is a 
protocol to send this message to other agents, one which is very similar to the 
mechanism defined to move an agent in § 2.1. Briefly, the Grid is reached via 
the AgentBody to broadcast the message to all available AgentBody’s (there- 
fore TalkingAgent ’s) on the Grid. The amount of code introduced for this 
class is surprisingly little, especially inspecting the TalkingAgent .hearSurrO 
method. Here, another agent is created as if it only existed in the hearing agent’s 



722 Cengiz Giinay 



mind, and it is given a fake Agent Body (imaginaryAgentBody extended from 
Agent Body) and placed on the mental grid of the hearing agent. The new cre- 
ated agent is born via a call to TalkingAgent . bringToLif eAt () (which eventu- 
ally calls ExplorerAgent .bringToLif e 0 ) at the certain point where the mes- 
sage is known to be generated. As the new agent will see its surroundings, it 
will update its mental grid with this information. But rather then using its 
natural perception, since its body is an instance of ImaginaryAgentBody, the 
ImaginaryAgentBody . see 0 method will return the surroundings as given in 
the message to the hearing agent. The new agent will be discarded at the end of 
the TalkingAgent .hearSurrO method. 

4 Conclusions 

Using Java exceptions as programming constructs in artificial intelligence algo- 
rithms is presented. Exceptions are originally provided for erroneous case han- 
dling by the Java language specification. Although it has been advised not to 
use exceptions for purposes other than error handling, here we have shown that 
exceptions can be used in certain situations, to increase the expressiveness of 
the language. 

Secondly, the advantages of incremental program design using object oriented 
schemes are given. An agent architecture project is presented with the emphasis 
on the incremental development and reuse of previous modules. It is shown that 
complicated behavior can be obtained by making design decisions which will let 
generalize the function of created modules, allowing them to be reused for future 
developments and possible improvements. 

There can be many future applications and improvements to the architecture 
presented in this paper. Using separate entities for the agent’s mind, body and 
the grid allows any of them to be customized independently, enabling new en- 
vironments and agents to be simulated. An example of this easily customizable 
behavior is given in § 3, in the description of the architecture TalkingAgent. 



Acknowledgments 

The author thanks Dr. Anthony S. Maida for comments and corrections on the 
draft of this article, and Dr. Rasiah Loganantharaj for initiating the project. 
The Java language is a trademark of Sun Microsystems, Inc. 



References 

1. Nilsson, Nils J.: Artificial Intelligence: A New Synthesis. Morgan Kaufmann Pub- 
lishers (1998) 717 

2. Mitchell, Tom M., Mitchell, Thomas M.: Machine Learning. McGraw-Hill (1997) 
717, 721 



Neural Network Based Machinability Evaluation 



Chris Nikolopoulos', Iqbal Shareef^, and Donald Kalmes^ 



^ Department of Computer Seienee, Bradley University 
chris@bumail . bradley . edu 

^ Department of Manufaeturing and Industrial Engineering, Bradley University 
shareef ©hilltop . bradley . edu 
^ Caterpillar Ine., Peoria, Illinois, USA 



Abstract. This paper reports on the progress of an ongoing researeh projeet to 
investigate the feasibility of using artifieial neural networks to shorten the time 
required for maehinability testing. A neural network model is used to prediet 
the eutting tool life for a given material. A short term test has been developed 
whose responses provide the input to the neural net. The results of the long- 
term tests, ISO 3685, are used together with the short-term test data for super- 
vised training of the neural networks developed in this researeh. 



1 Introduction 

It has been estimated that approximately 5% of GNP for developed countries is spent 
on machining operations. In the U.S. alone, nearly $80 billion is spent annually on 
machining operations. The need for an expeditious and accurate method to optimize 
these operations is apparent h The development of an economical machining process 
requires knowledge of the tool life as a function of the machining parameters. The 
present standard for determining the tool life and machinability of steels (ISO 3685) 
requires three to four days of testing by a skilled machinist, just for the laboratory 
work. Analysis and presentation take additional time. In an era of relentless worldwide 
competition, this test is very time consuming. The aim of this research project is to 
investigate the feasibility of using artificial neural networks to shorten the time re- 
quired for machinability testing. In this research, a short-term test will be character- 
ized and the responses from this test will be used as input to a neural network for 
predicting the tool life of materials. The results of the long-term tests, ISO 3685, will 
be used together with the short-term test data for supervised training of the neural 
networks developed in this research. The success of this study would have broad im- 
plications in the machining industry. 

The ISO 3685 test requires that a turning insert be used until the flank wear limit 
of 0.012 inches is reached. Several of these tests are repeated to develop a Taylor 
Tool life graph. The member companies of the American Iron and Steel Institute Ma- 



^ We gratefully aeknowledge the finaneial support of this projeet by the Soeiety of Manufae- 
turing Engineers through its Researeh Initiation Program. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 723-730, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




724 Chris Nikolopoulos et al. 



chinability Sub-Committee of the Bar Applications Group (AISI-BAG) have per- 
formed extensive studies to improve the repeatability of ISO 3685 and build a machi- 
nability database of some of the most commonly used industrial steels. The following 
table shows the steels being tested by the member companies and the steels made 
available for this research. 

Table 1: AISI-BAG Member Companies and Steels 
Tested/Supplied 



Member 


Caterpillar 


Chrysler 


Ford Motor 


[AMS 


Material 

Tested 


1045,4118, 1018 


1045,8620IC, 
4620,4118, 4320 


1045,8620BC, 
1018, 1345 


1045,1117, 

4340 


Material 

Supplied 






8620BC, 1345 




Member 


North Star 


Republic 


Stelco, Inc. 


Timken 


Material 

Tested 


1018, 8620BC 


1045, 1117, 1070 


1045,5160, 1345, 
1070 


1045, 4620, 

4340, 

8620IC 


Material 

Supplied 




1117 


1070,5160 


4340, 

8620IC 



Other standards for predicting the machinability of steels exist but they all depend 
on machining with an insert until its wear limit has been reached. There is no other 
accepted method for predicting tool life. A study was performed in Japan that at- 
tempted to use the chemical composition of a material to predict tool life, a fuzzy 
neural network was used with some success, [1]. The only other area of study for 
machinability has been on-line tool condition monitoring. 

A large number of studies have been performed to monitor tool condition during a 
machining operation. The objective of these studies is to develop a reliable methodol- 
ogy to detect a worn out tool during a production machining operation. Because direct 
observation of tool wear during machining is difficult, indirect means have been in- 
vestigated. Cutting force, vibration, spindle motor current, temperature, torque, strain, 
and acoustic emission measurements have been investigated as indirect measures of 
tool wear, [2]. Cutting conditions and images of the cutting tool have also been inves- 
tigated, [2]. These outputs of the cutting process measure more than tool wear alone, 
thus the salient features must be extracted and correlated to tool wear. Some of the 
analytic methods used to extract and correlate data to tool wear have included pattern 
recognition algorithms, [3,4], normalization techniques, linear regression, multivariate 
time series analysis, [5], analytic hierarchy process, [5], multiple regression analy- 
sis, [5], expert systems, [6], the group method of data handling, [5], and neural net- 
works. 

Many studies have been performed using neural networks to correlate indirect 
measurements to tool wear. Neural networks are used in this application because they 
are particularly useful in pattern recognition problems that involve capturing and 
learning complex underlying (but consistent) trends in the data. There are many dif- 
ferent neural network paradigms, some of the models that have been attempted in- 





Neural Network Based Maehinability Evaluation 725 



elude, adaptive resonance theory, [6], self-organizing map, [6], multi-layered feed 
forward, [7-16], fuzzy logic, [10,15], and radial basis function (RBF), [16]. 

The neural network paradigm that was initially experimented with in this research 
is the multi-layered feed-forward trained with back propagation (MLFBP) neural 
network. In addition to the input and output layers, MLFBP neural networks have a 
number of hidden layers of nodes. Each node in the input layer passes one component 
of the input vector to every node in the first hidden layer. For instance in the research 
the node corresponding to the feed force passes its value to every node in the first 
hidden layer. Each connection between nodes has a corresponding weight. The total 
input hi to each node i beyond the input layer is calculated by: 

k 

y=i 



where outj is the output of node j connected to node /, Wy is the weight of the connec- 
tion from node j to node /, and there are k input nodes to node /. The output of a node 
is the value of its transfer function for the total input to the node. The transfer function 
can be any differentiable function. Usually the step, hyperbolic tangent, or the sigmoid 
function is used. In this research, the sigmoid function is used: 

out: = U- 

1 + e ' 

The back propagation algorithm is used to train, or set the weights, of an MLFBP 
neural network. Back propagation is a gradient descent based algorithm that adjusts 
the weights in order to reduce the mean square error between the desired and actual 
output, [17]. 



2 Experimental Techniques 

A short term turning test was developed and used to gather the data to train a neural 
network. Twelve different grades of steel were tested (AISI 1045, 1070, 1117, 1141, 
1215, 1345, 4118, 4320, 4340, 5160, 8620BC, 8620IC). The short-term test consisted 
of 1 turning pass for 20 revolutions at a depth of cut of 0.05 inches, and 0.01 inch per 
revolution feed. The flank wear on the inserts was maintained between 0.0035” 
and 0.005” to minimize the effect of tool wear on the measured forces. Ten different 
speeds were used under dry cutting conditions. During each test the responses meas- 
ured were cutting force, feed force and thrust (radial) force. In addition to the three 
forces, two acoustic emission (AE) transducers of different frequency ranges were 
used to collect AE signals generated during machining. 

Each short-term test was repeated at least 3 times for the same conditions and each 
of the steels was tested at 10 speeds. The data was collected for 20 revolutions at 360 
samples per revolution. The last 5,000 data points in this data file were averaged to 
obtain one data point for neural network training. The data collected during the short- 
term test for this research were: 

• 3 forces, F^, Fy, F^. 

• 2 acoustic emission RMS sensor readings covering 2 frequency 
ranges: 50 to 400 and 100 to 900 H^. 




726 Chris Nikolopoulos et al. 



This data set and the chemical composition, hardness, tensile strength, and reduc- 
tion of area were used for training the neural network. The steel makers performed this 
material characterization as part of the AISI tests. The steels supplied for this research 
and those tested by AISI are from the same heats. The steels from these heats are in 
short supply, so to preserve data for any future analysis the following data was also 
collected at the time of the experiments but not analyzed in this research. 

• 2 acoustic emission sensor readings covering 2 frequency ranges: 50 to 
400 H, and 100 to 900 H,. 

• Tool wear 

The results of the short-term test were split into 2 sets; the training set and the test- 
ing set. The tool life in the training and testing sets was determined by consulting the 
Taylor Tool Life curves as developed by the AISI Bar Application Group, who per- 
formed the long-term tests. 



3 The Neural Network Model Design 



The type of neural network chosen for this research was the Multi-Layered 
Feedforward trained with BackPropagation, MLFBP, network. Other models, such as 
radial basis functions will be experimented with in the future in an attempt to optimize 
the NN model. 

The first phase of training used chemical composition and physical properties of 
the steel to predict tool life. The second phase used the short-term test results as pre- 
dictive attributes. The phase 1 data set consisted of 120 data vectors composed of one 
repeat of each of the short-term tests. 

The 120 data vectors were split into 2 data sets, the training and testing sets as fol- 
lows. The vectors were sorted by steel grade, then by the speed. The result is an or- 
dered list with the data vectors within each steel grade sorted in an ascending order. 
The testing set was obtained by removing the 2“^ and i-1 vector where i is the number 
of speeds contained in each steel grade. Thus the testing set speeds are contained 
within the range of the training set speeds. The result is a training set consisting of 96 
vectors and a testing set of 24 vectors. 

The training and testing sets for Phase II were constructed similarly. The resultant 
training set contains 344 data vectors and the testing set contains 94 data vectors. As 
the project evolves, more sophisticated techniques and extensive experimentation will 
be applied for choosing an optimum training set. 

The AISI chemical and physical analysis consists of data on 24 characteristics of the 
steels. The use of all this data would require a larger data set than is possible. A subset 
of 13 attributes were chosen, ad hoc, based on human expert’s advice on which were 
more relevant. A more careful analysis of the attributes will be done in the future to 
determine the most relevant ones. Phase I training had 14 input nodes, speed plus 
the 13 chemical and physical characteristics. Phase II training had 19 input nodes - 
the 14 Phase I nodes plus 3 force and 2 acoustic emission nodes. 




Neural Network Based Maehinability Evaluation 727 



In addition to choosing the training and testing sets, there are many parameters 
and/or controls or factors to set when training a neural network. A designed experi- 
ment was used to discover the optimum set of these parameters to use for training this 
particular data set. We experimented with seven possible significant parameters at two 
levels that had to be set. The seven factors are the number of hidden nodes, the learn- 
ing rule, the transfer function, the learning schedule, the epoch size, whether or not to 
use bipolar inputs of binary learning. 

A fractional factorial design was used to discover the optimum parameters to use 
for training the neural network. The response used for the analysis was the root-mean- 
square error between the desired and neural network output. This approach to opti- 
mizing the neural model can be generalized and we believe is a novel approach. Due 
to paper size limitations, we will be reporting on our factorial design optimization of a 
NN technique in a forthcoming paper. 

The best neural network discovered for phase one had 30 hidden nodes. The Ex- 
tended Delta Bar Delta learn rule was discovered to perform better than either the 
Delta or Normalize Cumulative Delta learn rules. The Extended Delta Bar Delta learn 
rule yielded a more accurate neural network using fewer hidden nodes. 



Neural Network Training 
Phase II 




Figure 1: Phase II Comparison of Delta vs. EDBD Learn Rules 



When using the short term test attributes as input variables, the best performing 
EDBD neural network had only 1 5 hidden nodes. A comparison of the performance of 
the Delta and EDBD learn rules for Phase II is summarized in Figure I. 

A comparison of the performance of the Delta and EDBD learn rules for Phase I is 
summarized in Figure 2. 





728 Chris Nikolopoulos et al. 



Neural Network Training 
Phase I 




Figure 2: Phase I Comparison of Delta vs. EDBD Learn Rules 



The EDBD learn rule produced better neural networks with fewer hidden nodes in 
both cases. The network with the lowest RMS error was saved every 500 training 
cycles. 



4 Discussion of Results 

Neural network training was performed in two phases. The training in Phase I was 
performed with the chemistry and material properties of the steels as inputs, and the 
tool life as the output. Phase II used the Phase I data and input variables whose values 
are determined from a short-term machinability test. In the training data, the tool life 
was computed from the Taylor’s tool equation computed by a regression analysis of 
the AISI-BAG long-term test results. 

The testing set and training set data vectors were presented to the neural network to 
predict the tool life as a final check on the neural networks performance. Figures 18 
through 29 are graphs of the original long-term test results with the Phase I neural 
network predictions included. The regression line for Taylor’s tool life equation is 
included along with a 95% confidence region. Figures 30 through 41 are graphs of 
the original long-term test results with the Phase II neural network predictions in- 
cluded. 





Neural Network Based Maehinability Evaluation 729 



5 Conclusions 

A neural network to predict tool life was attempted. Design of Experiments methods 
were found to be very useful in finding the appropriate settings for the network. The 
Extended Delta Bar Delta learn rule was discovered to perform better for this problem 
than either the Delta or Normalize Cumulative Delta learn rules. The Extended Delta 
Bar Delta learn rule yielded a more accurate neural network using fewer hidden nodes. 

The accuracy of the neural networks was good. The Phase I neural network outper- 
formed the Phase II neural network with the Delta learn rule. The situation was re- 
versed with the EDBD learn rule. The additional informational provided by the force 
and acoustic emission data was apparently helpful to the EDBD network. 

The success of the neural network as trained here opens up many exciting possi- 
bilities. AISI-BAG is starting a second round of maehinability testing. Performing the 
short-term test on these new steels plus the long-term test results might provide 
enough data to fulfill the promise presented here. 



References 

1. Sekiguchi, H., Komoriya, H., and Deguchi, T., “Reasoning of the Maehinability 
and Cutting Condition Based on the Chemical Compositions of Material (1st 
Report) - Influence of the Chemical Composition on Maehinability,” Seimitsu 
Kogakkai shi = Journal of the Japan Society of Precision Engineering, Vol. 62, 
No. 7, pp. 1004, 1996 

2. Dimla, D.E. Jr., Lister, P.M., and Leighton, N.J., “Neural Network Solutions to 
the Tool Condition Monitoring Problem in Metal Cutting - A Critical Review of 
Methods,” International Journal of Machine Tools and Manufacture, Vol. 37, 
No. 9, pp. 1219-1241, 1997 

3. Zheng, L., Luo, Z.B., Wu, Y., Xu, J.Q., and Zhang, B.P., “Research and Devel- 
opment on Synthetic Cutting Tool Monitoring With AE Signal,” Transactions of 
the North American Research Institution of SME 1990, Vol. 18, pp. 360-365, 
1990 

4. Yee, Kenneth W. , “Material Dependency of Chip-Form Detection Using Acous- 
tic Emission,” Transactions of the North American Research Institution of SME 
XL Vol. 15, pp. 458-462, 1987 

5. Das, S., Islam, R., and Chattopadhyay, A.B. ,“A Simple Approach for On-line 
Tool Wear Monitoring Using the Analytic Hierarchy Process,” Proc. Instn. 
Mech. Engrs., Vol. 211 Part B, pp. 19-27, 1997 

6. Silva, R.G., Reuben, R.L., Baker, K.J., and Wilcox, S.J., “Tool Wear Monitor- 
ing of Turning Operations by Neural Network and Expert System Classification 
of a Feature Set Generated From Multiple Sensors,” Mechanical Systems and 
Signal Processing, Vol. 12, No. 2, pp. 319-332, 1998 

7. Kurapati, V., Zhou, M., Caudill, R.J., “Design of Artificial Neural Networks for 
Tool Wear Monitoring,” Journal of Intelligent Manufacturing, Vol. 8, pp. 215- 
226, 1997 




730 Chris Nikolopoulos et al. 



8. Li X., Yao P., Zhejun, Y., “On-Line Tool Condition Monitoring System with 
Wavelet Fuzzy Neural Network,” Journal of Intelligent Manufacturing, Vol. 8, 
pp. 271-276, 1997 

9. Chao, P., Yi, H., Yeong, D., “An improved neural network model for the pre- 
diction of cutting tool life,” Journal of Intelligent Manufacturing, Vol. 8, No. 2, 
pp. 107-115, 1997 

10. Tamg, Y.S., Hwang, S.T., and Wang, Y.S. “A neural network controller for 
constant turning force,” International Journal of Machine Tools and Manufac- 
ture, Vol. 34, No. 4, pp. 453-460, 1994 

11. Azouzi, R., and Guillot, M. “On-Line Prediction of Surface Finish and Dimen- 
sional Deviation in Turning Using Neural Network Based Sensor Fusion,” In- 
ternational Journal of Machine Tools and Manufacture, Vol. 37, No. 9, pp. 
1201-1217, 1997 

12. Rahman, M., Zhou, Q., and Hong, G. S. “On-line Cutting State Recognition in 
Turning Using a Neural Network,” The International Journal, Advanced 
Manufacturing Technology, Vol. 10, No. 2, pp. 87-92, 1995 

13. Zhou, Q., Hong, G.S., and Rahman, M., “A new tool life criterion for tool con- 
dition monitoring using a neural network,” Engineering Applications of Artifi- 
cial Intelligence, Vol. 8, No. 5, pp. 579-588, 1995 

14. Purushothaman, S., and Srinivasa, Y.G., “A Procedure for Training an Artificial 
Neural Network With Application to Tool Wear Monitoring,” International 
Journal of Production Research, Vol.36, No. 3, pp. 635-651, 1998 

15. Li, S., and Elbestawi, M.A. “Fuzzy Clustering for Automated Tool Condition 
Monitoring in Machining,” Mechanical Systems and Signal Processing, Vol. 10, 
No. 5,pp. 533-550, 1996 

16. Elanayar, S., and Shin, Y.C., “Robust Tool Wear Estimation With Radial Basis 
Function Neural Networks,” Journal of Dynamic Systems, Measurement and 
Control, Vol. 117, No. 4, pp. 459-467, 1995 

17. Nikolopoulos, C., Expert Systems, Introduction to First and Second Generation 
and Hybrid Knowledge Based Systems, Marcel Dekker, New York (1997) 




Performance of MGMDH Network on Structural 
Piecewise System Identification 

Ali K. Setoodehnia ' and Hong Li ^ 

^ Computer Science, McNeese State University 
Lake Charles, Louisiana 70609 
setoodeh@mail . mcneese . edu 

2 

Conoco, Lake Charles, Louisiana 70601 
Li . Hong@usa . conoco . com 



Abstract. This paper will address the capability of the Modified Group Method 
of Data Handling (MGMDH) for continuous and piecewise continuous 
approximation functions. The MGMDH is used to construct the mathematical 
structure of dynamic system, which is observable only from its input and output 
with large number of input variables. The paper present the theory and 
simulation results for estimation of piecewise continuous polynomial function. 
The output performance evaluation is included. 

Keywords: GMDH, Modeling, Self-Organized Network, Multi-Layer 

Perceptron, Prediction, Identification 



1 Introduction 

Group Method of Data Handling (GMDH) network is Least Square 
Estimation (LSE) originally developed by Ivakhnenko at the institute of Cybernetics 
of Ukrainian Academy of Science in 1966[4]. This network models the input/output 
relationship of unknown system using a multilayer perceptron-type structure, 
Fig. l(all of the figures in this paper are listed at the end) shows architecture of a 
basic GMDH unit. This is a self-organized method based on sorting-out of gradually 
complicated models and their evaluation by external criterion on data samples. The 
training and testing process of GMDH are presented in this paper. 

In many systems there are a lot of processes for which is needed to know the 
future behavior or input/output relations. In the case of structural identification, 
suppose that the system is complex and the structure of the system is unknown, but as 
long as the system is both state controllable and state observable, all of the dynamic 
modes of the system will be contained in the input-output data base if the system 
input and output are sampled correctly. For the system where that has no prior system 
information available, a method is needed to find the mapping between the system 
input and output. The problem with system modeling methods [1,8] is the size of the 
possible system function space, which create a need for reliance of the algorithm on 
large amounts of prior knowledge or strong assumptions about the process which 
might not hold. It is necessary therefore, to create an algorithm, which does not 
depend on this type of knowledge, but relies mostly on the observed data behavior 
and could incorporate this knowledge when it is available. This leads to adaptation of 
multilayer perceptron network. Many papers and articles have discussed how the 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 731-740, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




732 Ali K. Setoodehnia and Hong Li 



artificial neural network perform on dynamic systems [3]. However, this paper 
presents the multilayer self-organized perceptron method for cases where the size of 
input vector and data set becomes very large. 



2 Basic Principle of GMDH 

Referring to Fig.- 1, each processor element in the L-th layer of the network 
implements a quadratic polynomial of two adjacent inputs from (L-l)-th layer. A 
general description polynomial is shown below: 

Y (P, L, I, J) = A (1,L, I, J) + A (2,L, I, J) X (I) + A (3,L, I, J) X (I) ^ 

+ A (4,L, I, J) X (J) + A (5,L, I, J) X (J) ^ 

+ A(6,L,I,J)X(I)X(J) (1) 

where P and L are number of pattern and layer respectively; I & J are the adjacent 
inputs. The total number of intermediate outputs is P (P-l)/2. Each of the intermediate 
outputs is considered as a crude approximation to the actual output, and coefficients 
for the k-th element are usually determined by a least square method [4,5,6,9,10]. The 
normal equation is shown below [1]: 



A = [X‘X]-'X'Y (2) 

In order to obtain unique solution the matrix [X'X] must be full rank. In case of 
singularity we can add small random signal [a] to overcome the inversion problem. 
The object is to keep only the best of Y(P,L,I,J). The complete selection procedure is 
presented in following pages. 

Referring to Fig.-l and using all data for training may cause overfitting 
problem, and need more computation and memory space. The problem is that the 
sum of squares of errors would get larger if the polynomial is tested for new set of 
observations. Therefore, we separate the data almost equally into training set and 
testing set for our modified GMDH model to overcome the overfitting problem. The 
following section will provide one way of data separation methods. 



3 Data Separation 

The Hypercube (H) method is proposed for separating the data into training 
and testing sets. It will discover the groups, or clusters of points that lie close together 
in data space, and then randomly divide the data from each cluster into training and 
testing sets. Other clustering method, such as ellipsoid method was used before, but 
ellipsoid method would not cover all the points, might cause overlapping and need 
more computations. 

The analysis of H method is as follows: Suppose that the input data set M is 
a bounded subset of R“ in the sense of Euclidean metric, then there exist minimum 




Performance of MGMDH Network on Structural Piecewise System Identification 733 



value (rui) and maximum value (Mi) such that for all k, mi < Xi < Mi , where 
i = 1 , 2, 3, n. Let 1 be the number of disjoint equal intervals, then the 
Hypercubes(H) can be defined as: 

Hki,k 2 ,...,kn={(Xi, X 2 , |mi+ [(k-l)(Mi-mi)]/KXi<mi+ [ki(Mi-mO]/l} (3) 

Where ki= 1,2, ...,1. 

The equation (3) can cover the complete data domain Q without overlapping. The 
following theorem gives the bounded error for GMDH model. 

Theorem-1: For any bounded input/output stable system, apply Hypercube method to 
the input data M and select half of data from each H as training and the other half as 
testing, then the error of the GMDH model is bounded by “ C. 5 + a”, where “C” is a 
constant, 5 is the largest diameter of all the cubes, and a is minimal threshold. 

Proof: After the training and testing procedure of the GMDH network, suppose the 
model is 

Y = G(X), for all X g M, |G(X) - Y„,(X)| < a 
For any Xg (Q - M), there exist Xq e T such that |X - Xq| < 5. By assumption of the 
stability of the model, there is L such that 

|Y^Xo)-Y^(X)|<L*|Xo-X| 
and since G(X) is polynomial, there is a constant L such that 

|G(X)-G(Xo)|<L|Xo-X| 

therefore, 

|G(X) -Y„,(X)| < L|X - Xol + a + L*|Xo -X| < (L+L*)|X-Xo| + a < C8 + a. 



4 Piecewise GMDH Network 

In order to reduce the complexity of the basic network unit and reduce the 
output surface error when the number of patterns is very large or it is discontinuous 
function, a decomposition GMDH model is proposed to be used to construct the 
system model. Figure-2 shows the block diagram of this model. The model is 
described as follows: Assume the input data set as 



X(k) = [xi(k), ...Xn(k)] ; k=l,...,p 

Where n is number of input variables and p is number of points, and output 
measurement as 

Y„=[y(l),...,y(p)]‘ 

Now, if p is very large, divide the data set into m boxes, denoted by Gi,G 2 , ...,Gm, 
where the Gi contains inputs 

{X(k) I k = (I-l)*p/m + 1 , 
where I = 1,. . .,m, and output measurements as 




734 Ali K. Setoodehnia and Hong Li 



Y' = [ y((I-l)*p/m)+l), y(I*p/m)]t 

Apply basic modified GMDH Network unit (Figure- 1) to each Gi (Figure-2) to find 
the polynomial model Fi then 



F(xi, ...,Xn) = Fi(xi, ...,Xn) if^L^ Gi 
is the piecewise polynomial model for the system. 

Let FO be a polynomial which is constructed only with one GMDH network 
unit, and EFO be the output error. Also, denote the error from each box Gi as EFi. 
Define the error of final model of the system as 

EF = E EFi; fori = l,...,m. 

Theorm-2: If EF is the error of piecewise polynomial GMDH model and EF o is the 
error of the a polynomial GMDH model, then EF < EF q. 

Proof: (omitted) 

The complete algorithm for Modified GMDH Network unit is described in next 
section as Self-Organized Algorithm. 



5 Self-Organized Algorithm 

Step-1 : Separate data into training set and testing set using hypercube method, this 
would classify whole domain with distinct clusters. From each hypercube, randomly 
select half of data as training set and the other half as testing set. 

Setp-2 : select two variables at a time for all adjacent selected inputs as shown in 
Figure- 1. 

Step-3 : Generate an optimal description polynomial for step-2 as shown below. Use 
the least square estimation to calculate the coefficients for each node at each layer. 

GI = aO + alxi xj 
G2 = aO + alxi + a2xixj 
G3 = aO + alxixj + a2xi^ + a3xj^ 

G4 = aO + alxi + a 2 Xj + asXiXj + a 4 Xi^ + asXj^ 

Step-4 : Using the training and testing data set, then compute the total relative error 
for each partial description generated from step-3 for each node in the network as: 

SSEt,(j) = I[Yt,(i)-Y(i,j)]^ (4) 

for sum of training set and 

SSE^G) = Z[Y,,(i)-Y(i,j)]^ (5) 

for sum of testing, then 

SSE = SSEtr + SSEts 
as the total sum of the error squares. 



( 6 ) 




Performance of MGMDH Network on Structural Piecewise System Identification 735 



Step-5 : Select the partial description from step-3 with respect to the smallest error 
for all nodes in the current layer. However, the F-test can be used for the significance 
of each coefficient for each polynomial. 

Step-6 : Select all Partial Description (PD) whose Mean Square Error (MSE) is less 
than the threshold for the current layer. The threshold is defined as a mean value of 
the total sum of the error squares (6). However, the F-test can be applied to all 
selected nodes to check the equality. 

Step-7 : Check if the minimum MSE among all generated PDs of the current layer is 
less than the assigned accuracy. If it is true, then the corresponding PD become the 
best description. Then go to step-9. 

Step-8: Check for convergence, if it is true, use the intermediate outputs as new input 
and go to step-2 and repeat the process for the PDs from the previous layer as the best 
description. 

Step-9: Reconstruct the complete description and stop. Starting at the layer 
corresponding to the best PD, the scheme traces back the variable and PD relationship 
from the higher layer to the lower layer until down to original input variables. 



6 Output Performance Evaluation 

Once the description of the system has been determined, the performance 
confidence interval can be used to evaluate the performance of the constructed model. 
It determines the confidence coefficient that gives the probability of true value of the 
parameter falling within the true output. The confidence interval can be established 
through the probability distribution law of some appropriate statistic functions that 
estimate the value of the parameter from given sample [2,7]. The steps for finding the 
Cumulative Distribution Function are as follows: 

Step-1 : Observe, the input/output data, estimate the model via MGMDH 
Step-2 : Compute the error difference between the actual value of the output of the 
system and estimated value of MGMDH. 

Step-3 : Compute random number generator and select m out of n error differences. 

Ci* ; i=l,2, m. 

Step-4 : Calculate the mean value of the error: E(ej*); j=l,2, ..., P. 

Step-5 : Repeat the above steps P times and assign threshold a: 0.05, 0.1, ...., 0.95 

Check the mean value of the error: 

If it is less than a 

Find total number that pass the a: T 

Calculate the probability of the error as: Prob(mean error) = T/P 
Step-6: Repeat the process for different “a”. 




736 



Ali K. Setoodehnia and Hong Li 



7 Updating GMDH Model 

The existing reconstructed GMDH model can be updated parametrically 
once a new observation becomes available. Consider that a multi-layer description of 
partial polynomial was generated by GMDH, the recursive least square estimation 
(Kalman filter) [1,8] can be used to update the parameter for one single additional 
data point at each partial description in each layer from low to high. 



8 Simulation 

In this section we present some of the simulation results which are associated 
with the mathematical analysis given in this paper. This simulation shows the results 
of hypercube data separation combined with piecewise GMDH model for following 
example: 

An arbitrary function is chosen as: 



Y = X 3 * Xs + Xj*X% if Xio > 0 

Y = X 2 ^X 4 + X 6 ^X 8 , ifXio<0 

Where Xi, ..., X 9 are input variables. The training data set are chosen randomly from 
interval [ -1, 1] for all variables. In this simulation, we first read the data vectors and 
separate them by Xio, into two regions. Secondly, the hypercube data separation 
was applied to each region to separate the data into small clusters by simply 
applying equation (3). Here, we used 2 for 1 as the number of disjoint equal intervals 
for positive and negative values, and eight clusters were used by simply using the 
three most variables which have longer intervals in the domain of [ -1, 1]. The total 
number of points in each hypercube was in the range of 40 to 60. At last, the GMDH 
network was used based on the training process, which was given under the section 
self-selection threshold to filter out those intermediate outputs, which are less useful 
for prediction. Table 1 and 2 show the result of the training process for two basic 
GMDH units for XIO > 0 and XIO < 0 respectively. Also, Table 3 shows that the total 
relative mean square error using two basic units is less than using one basic unit. 
Some of the notations in the tables are defined as follows: E (i) is the error passed 
threshold, the threshold (TH) is the relative mean square error; Xi,Xj; P; inputs, and 
the type of polynomial ( Gl, G2, G3, or G4 ). The P is indicated only in the first layer 
in all the tables because of spaces. In the I-th layer the relative mean square error is 
given as follows: 

E(i) =Z'’j.i[(D(j)-Y(j))/DO')]" 

T = E^i=i E (i) 

TH =T/N 

Where i, j, P and N are nodes, pattern, total number of patterns, and total number of 
nodes respectively at each layer. For the convergence solution, the threshold in L-th 
layer should be less than in the (L-l)th layer, so that the threshold should become 
progressively smaller as it passes to a succeeding level of the MGMDH. 




Performance of MGMDH Network on Structural Piecewise System Identification 737 



Table 1 MGMDH and HDS constructed model for XIO > 0: 



L! 


EITH 


L2 


E2Th 


L3 


E3Th 


L4 


E4 


xixj 


10.39 




4.96 




1.4 






3,5 


5.34 


1,4 


0.08 


1,2 


0.08 


1,2 


0.068 


3,9 


10.38 


2,4 


4.84 


1,3 


0.06 






3, 10 


9.96 


3,4 


4.11 










7,9 


5.19 















Table 2 MGMDH and HDS constructed model for XIO < 0: 



LI 


El Th 


L2 


E2 Th 


L3 


E3Th 


L4 


E4 


xixi 


10.56 


XiXi 


5.09 


XiXi 


1.87 






2,4 


5.75 


1,3 


0.53 


1,3 


0.53 


1,2 


0.53 


5, 10 


10.55 


2,3 


4.64 


1,3 


0.53 






6,7 


4.69 


3,4 


4.67 










6, 10 


10.27 















The approximation model based on the two basic GMDH units for the piecewise 
polynomial function. Moreover, as it was mentioned before that hypercube data 
separation is better than random selection data separation, we used the hypercube 
data separation and constructed multiple GMDH network is tested with new data set 
based on the functions which are defined as follows: 

XI = .8 * sin (27ik/N) 

X2 = .75 * cos(27Tk/N) 

X3 = .5 * sin (7Tk/N) + .26 cos(27ik/N) 

X4 = -. 99+ 1.5k/N 

X5 = .62 * sin (27ik/N) + .23 * sin (jik/N) 

X6 = 1/(1 + k/N)) 

X7 = .8 sin (47Tk/N) 

X8 = .85 * cos(47ik/N) 

X9 = .25 * sin (27ik/N) + .46 cos(27ik/N) 

XIO = sin (27ik/N) 



Fig. -3 shows the result of testing for 800 points. After the system has been modeled, 
the performance confidence interval can be used to determine the probability of a true 
output model. In this example the output model cumulative distribution function (cdf) 
was established through the probability distribution law. Figure - 4 shows the cdf for 
the model. 

The MGMDH was applied to other nonlinear time series functions for forecasting 
propose. The results of the simulation were above the satisfactory. 






738 



Ali K. Setoodehnia and Hong Li 



9 Applications 

The GMDH can be used in many different fields. Some of the applications 
are as listed below [4,5,6,9,10]: 

• Manufacturing 

• Ecological System 

• Medical 

• Environmental 

• Economical 

• Acoustic and Seismic analysis 



10 Summary 

This paper has demonstrated the feasibility of GMDH network in modeling 
nonlinear piecewise continuous function, especially with large size input vector. The 
results of computer simulation with different nonlinear functions indicate that the 
GMDH as a self-organized network is capable of constructing mathematical model of 
unknown dynamic system. This network can be used in many applications such as 
like pattern recognition, prediction, and chaotic system. 



Reference 

1. Astrom, A.J. and Wittenmark B., Adaptive Control, Addison- Wesley, (1999) 

2. Efron, B., The Jackknife, the Bootstrap and other resampling plans, V38, SIAM 
(1982) 

3. Hecht-Nielson, R., Neurocomputing, Addison- Wesley, (1990) 

4. Ivakhnenko, A.G., “The Group Method of Data Handling-Arrival of the method 
of Stochastic Approximation”, Soviet Automation Control, V13, No. 3, (1966) 

5. Ivakhnenko, A.G., “Polynomial Theory of Complex System”, IEEE Transaction 
on systems, Man, and Cybernetic, Vol. SMC-1, No. 4, Oct. (1971) 

6. Ivakhnenko, A.G., “Information Processing GMDH”, Soviet Automatic Control, 
(1990) 

7. Papoulis, A., Probability, Random Variables and Stochastic Processing, 
McGraw-Hill (1965) 

8. Soderstorm, T., and Stoica, P., System Identification, Prentice Hall (1989) 

9. Stepashko, V.S., “ A GMDH Algorithm for Two Level Modeling of 
multidimensional Cyclic Processes”, Soviet Automatic Control, (1988) 

10 Yurachkovskiy, Y.P.; “Convergence of Multilayer Algorithm of the Group 
Method of Data Handling”, Soviet Automatic Control (1981) 




Performance of MGMDH Network on Structural Piecewise System Identification 739 




probability 



740 Ali K. Setoodehnia and Hong Li 











Black-Box Identification of the Electromagnetic 
Torque of Induction Motors: Polynomial and 
Neural Models 



Lucia Frosini and Giovanni Petrecca 

Department of Eleetrieal Engineering, University of Pavia 

Via Ferrata 1, 27100 Pavia, Italy 
{ lucia, petrecca}@unipv. it 



Abstract. In this paper we examine the problem of knowing the value of 
steady- state eleetromagnetie torque in induetion motors installed in industrial 
plants. The models derived from two parametrie blaek-box identifieation 
teehniques (polynomial and neural) are implemented and tested for two motors 
and eompared with the analytieal model provided by the equivalent eireuit 
theory. Both provide better performanees when eompared to the latter; the best 
performanee is given by the neural model. 



1. Introduction 

The study of steady-state electromagnetic torque (Tg) in induction motors installed in 
industrial plants is an important issue in order to have information about the working 
conditions of mechanical loads. 

Physical models of electromagnetic torque require the knowledge of some variables 
usually not available in normal operating conditions, such as the equivalent circuit 
parameters (rotor resistance R’r, rotor leakage inductance L\, magnetizing 
inductance L’^). In fact, the induction machine equivalent circuit theory provides the 
following equation: 

X = ^ 

® (2;tf-ppco„) R;V(27if-ppco^)2+(L;+L’^)2 

According to the above expression the torque is calculated through the values of the 
phase stator current Is, rotor speed 0)^, supply frequency f, number of pole pairs pp 
and equivalent circuit parameters. Whereas we can easily measure the first three 
variables, it is very difficult to know the exact values of the equivalent circuit 
parameters. Sometimes the manufacturers can provide these values, but often it is 
necessary to calculate them through special tests; in this way, we can obtain only 
approximate values for the equivalent circuit parameters, and thus an approximate 
value for Tg. Because of the dependence of Tg from the stator current Is and the rotor 
speed cOtn, we can use a black-box identification approach to determine the 
dependence of Tg upon these variables. 



R. Loganantharaj et al. (Eds.): lEA/AIE 2000, LNAI 1821, pp. 741-748, 2000. 
© Springer- Verlag Berlin Heidelberg 2000 




742 Lucia Frosini and Giovanni Petrecca 



The aim of this paper is to present and eompare two different blaek-box identifieation 
teehniques for toque modeling: polynomial models and neural models (Multi-Layer 
Pereeptron). The first step is the applieation of these teehniques on a 1.5 kW three- 
phase squirrel-eage induetion motor in order to identify models of Te. The reliability 
of the identified models is finally tested on the data gathered from a 3 kW induetion 
motor [1]. 



2. Experimental Set-Up 

For the experimental phase a test beneh equipped with an induetion motor eoupled 
with a brake has been used. The motor is supplied direetly from the mains (without 
inverter) and the brake is regulated by a eontrol and measurement system. In Table 1 
the plate data of the 1.5 kW motor, provided by the manufaeturer, are reported. 



Table 1. Parameters of the 1.5 kW motor (provided by the manufaeturer) 



PARAMETERS 


VALUES 


Rated power 


p„ 


1.5 


kW 


Rated voltage Y/A 


V„ 


380/220 


V 


Rated eurrent Y/A 


In 


3.7/6.4 


A 


Rated speed 


(Oh 


148.7021 


rad/s 


Rated torque 


T 

^ n 


10 


Nm 


Supply frequeney 


F 


50 


Hz 


Magnetizing induetanee 


L’„ 


289 


mH 


Stator leakage induetanee 


Ls 


9.66 


mH 


Rotor leakage induetanee 


L\ 


11.5 


mH 


Stator resistanee 


Rs 


4.632 


Q 


Rotor resistanee 


R’r 


3.742 


Q 


Number of pole pairs 


Pp 


2 





The experimental test is based on the measurement of the following values: average 
eleetromagnetie torque Te, rms value of the stator eurrent Is, average rotor speed 0)^. 
This has been repeated with a load ehanging from 2 Nm (the brake threshold) up to 
the rated eleetromagnetie torque, 10 Nm. For the 26 measurements performed 
imposing eonstant values of torque, the aequisition time is 1 s and the sampling 
frequeney 1 kHz. 



3. The Black-Box Identification 

The blaek-box identifieation eonsists of inferring a relationship between inputs and 
outputs of a system on the basis of experimental data. It represents an alternative to 
the analytieal modeling when it is not possible to obtain reasonable models using only 
physieal insight or when the model based on physieal insight eontains a number of 
unknown parameters. 




Black-Box Identification of the Eleetromagnetie Torque of Induetion Motors 743 



Two main types of model stmetures ean be used: non-parametrie and parametrie 
models. Some examples of non-parametrie models are step response, impulse 
response and frequeney diagram (Bode plot). Parametrie models are eharaeterized by 
a parameter veetor 0 [2]. In our researeh only parametrie models have been 
eonsidered. 

Let Xi, X2, ..., Xn be the inputs of the system, y the output and y = f(xi, X2, .., Xn) the 
Lmetion to be identified. This is a multi-inputs-single-output system (MISO). With 
referenee to the problem, Xi and X2 stand for Is and 00^ respeetively, while y is Te. 

The blaek-box identifieation through parametrie models eonsists of determining the 
parameter veetor 0 from a set of observed data (ealled identification or training set) 
ineluding a set of input variables values Xi(t), X2(t), ..., Xn(t) and the eorresponding 
output values y(t): 

= { [xi(t), X2(t), x„(t), y(t)] 1 1 = 1, N } (2) 

The goal of the identifieation (or training) phase is to determine a mapping ^ 0 
from the set of training data to the set of possible parameters so that the model will 
produee outputs y(t) whieh in some sense are “elose” to the true values y(t). To solve 
this problem, a measure of eloseness in terms of Mean of Squared Residuals (MSR) is 
introdueed: 



MSR(0,z'') = ly (y(t)-y(t I 0))^(y(t)- Kt I 0)) (3) 

The parameters are then found as: 

0 = arg0 min MSR(0, Z^) 

Equation (4) is solvable in elosed form when the dependenee of y(t I 0) from 0 is 
linear or by iterative algorithm when this dependenee is non-linear. Finally, the 
derived model has to be validated on a fresh set of data, ealled validation set. 

So, the first step of blaek-box identifieation through parametrie models is to deeide 
the strueture of the relationship y = f(xi, X2, .., Xn). The seeond step is to estimate the 
parameters of the model seleeted in the first step by minimizing the MSR on the 
identifieation (or training) data, as shown in (4). The same kind of MSR eriterion, 
with respeet to the validation set, is used to give an objeetive assessment of the 
relative merits of models having different eomplexity (e.g. linear vs. quadratie) or 
even different strueture (e.g. polynomial vs. neural networks) [3]. 

In this study, we subdivided the 26 observed groups of data into two sets of 13 veetors 
that have been used for identifieation and validation respeetively. The problem was to 
find a funetional relationship Te = Te(Is, CG^) eompatible with the eolleeted data. Two 
partieular types of parametrie models have been used: polynomial and neural models. 




744 Lucia Frosini and Giovanni Petrecca 



3.1 Polynomial Identification 

In this case, a polynomial dependence of y on the input variables x i and X 2 is assumed. 
For instance, the most general second-order polynomial would be: 

y = f(x,,X2) = ao +aioX, +aoiX2 +a2oxf +a„x,X2 +ao2X2 (5) 

Once the structure of the model has been fixed, the estimation of the parameters ay is 
straightforward because the model is linear in the parameters and (4) can be solved in 
closed form. A much more challenging problem is to determine the optimal structure 
and the order of the model. The simplest approach is to consider various structures, 
use the identification data to identify the parameters and choose the model that yields 
the best fit when applied to the validation data. 

In the present case it has been observed that the quite poor performance of a first- 
order model improves by upgrading to the second-order. Further improvements are 
obtained by increasing the order up to the fourth, while more complex models give 
worse performance. So, the best polynomial model is the following: 

y = 0. 1 882 + 0.5806X, - 0,4780x2 “ 0. 1066xf - 0. 1 8 1 3x,X2 - 0.2929x1 
-0.0382x^-0.0478x^+0.0143xf 

and its performance is: MSRi = 0.0065, MSRy = 0.0126. 



3.2 Neural Network Identification 

On the basis of the good results obtained with the polynomial identification, the neural 
network identification is expected to work adequately even if it is realized with a quite 
simple structure. For this reason a Multi-Layer Perceptron network (MLP) with a 
single hidden layer has been used. For the choice of the number of the hidden neurons 
we used a “trial and error” method: starting from the simplest model (one neuron with 
linear function) and increasing the complexity of the network (one hyperbolic neuron, 
two linear neurons, two hyperbolic neurons, etc.). The method stops when the 
performance of the network, in terms of MSRy, becomes worse. 

The neural network estimate requires an iterative algorithm and therefore an 
initialization of the parameters to be estimated. Since the initialization is usually 
random, it is necessary to process the identification software of each neural model 
several times in order to obtain significant results. 

In order to estimate a neural model, the type of algorithm and the performance 
function to be minimized with the algorithm have to be fixed. 

In this case the Levenberg-Marquardt algorithm has been used due to its rapid 
convergence property and robustness. The performance function is generally the mean 
of squared residuals, as reported in (3), to which a “regularization” term is added, so 
that the performance function becomes: 

MSR(0, Z"" ) = - y (y(t) - y(t I e))^(y(t) - y(t 10)) + - 0^'D0 
N " n 



( 7 ) 




Black-Box Identification of the Eleetromagnetie Torque of Induetion Motors 745 



where N is the number of identifieation data, n is the number of weights of the neural 
network, D is a diagonal matrix sueh as D = (l-y)I and y is a eonstant. In praetiee, the 
regularization term eonsists of the mean of squared network weights “weighted” with 
the eonstant (1-y). Equation (7) ean be written also in the following form: 

MSR(0,Z'') = YTy (ej2 (8) 

where et is the generie residual (differenee from the aetual value y(t) and the 
eorresponding output of the model) and Wj is the generie weight of the network [4] . 

To assess the performanee of different neural models, the following parameters have 
been modified from time to time: 

i) number of hidden neurons and their aetivation funetion: the aetivation funetion of 
the hidden neurons ean be linear or hyperbolie; 

ii) value of “goal” that the performanee funetion has to aehieve in order to stop the 
training phase; 

iii) value of parameter y in (8), that ean be varied from 0 to 1 : when the value of y is 
nearer to 1 , the weight of the regularization term is lower. 

The proeedure starts with a network with a single linear hidden neuron and several 
tests are made, setting the value of goal to 0.01 and varying the value of y. The results 
of these tests show that networks with linear hidden neurons produee always the same 
performanee, even if the number of neurons inereases and the value of y varies in a 
quite wide range (from 0.4 to 1): MSRj = 0.1501, MSRy = 0.1399. 

Using hyperbolie hidden neurons, the neural network performanee improves. In these 
eases, the values of y related to the best neural models are > 0.9. 

The best neural model has 3 hyperbolie neurons, goal = 0.01 and y = 0.9. Its 
performanee is slightly better than the best polynomial model performanee: 

MSRi = 0.0081 MSRv = 0.0125. 

In the opinion of the authors, the slight improvement obtained by neural models is due 
to the quite regular dependenee of the eleetromagnetie torque on the rms value of the 
stator eurrent and on the rotor speed. So, for the eolleeted data, this dependenee ean 
be explained with suffieient preeision through a elassieal polynomial model. 

In order to improve the neural network performanee, the squared terms of stator 
eurrent and rotor speed have been added to the inputs of the network. After several 
tests, we obtained poor results, worse than the previous: MSRj = 0.0011, 
MSRv =0.0367. 

This failure is due to the small number of data used for the neural network training 
(13 for eaeh variable) eompared to the number of parameters to estimate, that 
inereases as the number of inputs and the number of hidden neurons inerease. In other 
eases, it has been proven that adding squared terms of input variables improves the 
neural network performanee [5]. 




746 Lucia Frosini and Giovanni Petrecca 



4. Performances of Black-Box and Physical Models 

By using the classical equivalent circuit theory applied to induction machine, we 
obtain the expression of the electromagnetic torque (1) as function of one-phase stator 
current Is, rotor speed cobi, supply frequency f, number of pole pairs pp and equivalent 
circuit parameters (R’r, L\, L’^). We calculated the electromagnetic torque values 
using the parameters of the motor (see Table 1) and the values of Is and co^ collected 
in the 26 measurements. Then, we compared these calculated values to the values 
collected during the 26 measurements. This comparison is shown in Figure 1. The 
error between the measured values and the collected ones may be partly explained 
because of the friction and windage losses, that are strictly dependent from the rotor 
speed and become more relevant in low load conditions. These losses are included in 
the expression of the electromagnetic torque (1), while they have to be subtracted to 
this expression in order to obtain the value of the torque available at the load shaft. In 
Figure 1 are also reported the values calculated by using the best polynomial and 
neural models. This comparison highlighted the good performance of these models. 
Defining the models performance as average absolute error in percentage (AAE) and 
as MSR calculated on all the data, we have: 

• equivalent circuit theory: AAE = 36.82% MSR = 2.5898; 

• polynomial model: AAE = 1 .90% MSR = 0.0095; 

• neural model: AAE = 1.88% MSR = 0.0092. 



Comparison of calculated and collected values of torque (1.5 kW motor) 




Fig. 1. Comparison of the calculated values through the three models and the collected values 
of torque (polynomial model is almost superimposed on neural model) 



Black-Box Identification of the Eleetromagnetie Torque of Induetion Motors 747 



5. Performances of the Identified Models Applied to Other Motors 

The models obtained by system identifieation are relatively easy to eonstruet and use. 
On the other hand, they present some drawbaeks, in eontrast to the models based 
solely on mathematieal modeling (i.e. physieal insight). In partieular, they have 
limited validity: they are valid for a eertain working point, a eertain type of input, a 
eertain proeess, ete. [2]. For this reason, the performanees of the identified models 
(polynomial and neural) have been tested using some data eolleeted from a 3 kW 
induetion motor. The rated torque of this motor is 19.5 Nm, whereas the rated torque 
of the 1.5 kW motor is 10 Nm. Only the data eorresponding to values of torque in the 
range from 2 Nm to 10 Nm, eolleeted with a step of 1 Nm, have been used. Finally, 
we have nine sets of measurements. First, we tested the best polynomial model. The 
results are very bad: MSRy = 28.0999, AAEy = 93.40%. Seeond, we tested the best 
neural model. Its performanee is eonsiderably better than the polynomial performanee: 
MSRy = 1.6515, AAEy = 21.90%. In Figure 2 the eomparison of the outputs of the 
best polynomial and neural models and the eolleeted values of 3 kW motor is 
reported. The results provided by the neural model lead to the eonelusion that it is 
possible to know an approximate average torque value of a motor by using a model 
identified and validated on another motor, without knowing the equivalent eireuit 
parameters. This approximation is therefore better than the approximation obtained by 
the equivalent eireuit theory (Chapter 4). 



Comparison of calculated and collected values of torque (3 kW motor) 




Fig. 2. Comparison of the outputs of the polynomial and neural models and the eolleeted values 
of torque (3 kW motor) 



748 Lucia Frosini and Giovanni Petrecca 



6. Conclusions 

In this paper the problem of evaluating the value of steady-state eleetromagnetie 
torque (Tg) in induetion motors installed in industrial plants has been examined. 

Two parametrie blaek-box identifieation teehniques are analyzed as alternative to the 
mathematieal model provided by the equivalent eireuit theory. In faet, the latter model 
requires the knowledge of some data usually not available in normal operating 
eonditions (the equivalent eireuit parameters). On the eontrary, blaek-box 
identifieation models allow to know the average value of torque only through the 
value of variables easy to measure: the rms value of stator eurrent Is and the average 
rotor speed cc\n. 

The performanees of polynomial and neural models are eompared. Both the models 
provide an average absolute error lower than 2%. The neural network performanee is 
slightly better than the polynomial performanee when the models are used to estimate 
the average torque value of the same motor employed for the data eolleetion. 

The performanees of the identified models are also tested on a different size motor. 
The results show a poor performanee of polynomial model and a pretty good 
performanee of neural model. 



References 



1. Frosini L., Impiego delle reti neurali per applicazioni nel campo della conversione 
dell ’energia (Applications of neural networks in the energy conversion field) , Ph.D. thesis 
in Electrical Engineering, University of Pavia (Italy), October 1999 (in italian). 

2. Soderstrom T. and Stoica P., System Identification, Prentice Hall, 1989. 

3. De Nicolao G., Scattolini R. and Siviero C., “Identification of the volumetric efficiency of 
IC engines: parametric, non-parametric and neural techniques”. Control Eng. Practice, 4, 
1405-1415, 1996. 

4. Demuth H. and Beale M., Neural Network Toolbox for Use with MATLAB®, The 
MathWorks Inc., 1998. 

5. Chow M. Y., Methodologies of using Neural Network and Fuzzy Logic technologies for 
motor incipient fault detection. World Scientific, 1997. 




Authors Index 



Aguilar, Jose 


561 


Fettahlioglu, Mahmut 


705 


Agusti , Jaume 


660 


Fillion, C. 


671 


Alptekin, Serna 


651 


Fimbel, Eric 


284 


Al-Shihi, Badria 


334 


Foley, Harold 


260 


Angeli, Chriss 


184 


Forsyth, Graham 


380 


Aref, Mostafa M. 


591 


Friedrich, G. 


24 


Artikis, Alexander 


4 


Frosini, Eucia 


741 


Bachler, Gemot 


109 


Frost, F. 


73 


Bayyapu, P. 


306 


Fusaoka, Akira 


198 


Benbasat, Izak 


414 


Gaura, Elena 


534 


Benton, Ryan 


434 


Gavriel, Yosef 


168 


Berger, Martin 


109 


Geiselbrechtinger, Franz 


512 


Bi, W. 


210 


Ghent, J. 


210 


Bicharra Garcia, Ana C. 


316 


Gibbons, W. M. 


93 


Bottaci, L. 


390 


Golm, Florian 


345 


Boukachour, Hadhoum 


711 


Grigg, R. B. 


623 


Brennan, Mike 


46 


Guerin, Frank 


4 


Bridges, Susan 


85 


Guesgen, Hans W. 


204 


Brito, Luis 


14 


Giinay, Cengiz 


717 


Bui, Rung T. 


671 


Hafez, Alaaeldin 


220 


Bilker, Ulrich 


502 


Harmelen, Frank van 


139 


Bultman, Ame 


139 


Harris, Chris 


46 


Cakic, Jovan 


351 


Hartmann, George 


502 


Cardon, Alain 


711 


Hasegawa, Takaaki 


573 


Carver, Doris L. 


79 


Hempel, Oliver 


502 


Chang, S.-H. 


623 


Hendtlass, Tim 


322 


Chen, Chih-Ming 


555 


Herbert, T. 


306 


Chung, Paul W. H. 


334 


Hewett, Rattikorn 


406 


Clay, Paul 


99 


Hiratsuka, Satoshi 


198 


Coffey, John 


406 


Hodges, Julia 


85 


Correa da Silva, F. S. 


660 


Holdich, Richard G. 


334 


Crispin, Alan 


99 


Hooker, Jeffery 


306 


Crossley, Sam 


99 


Howie, D. 


56 


Cui, Yi 


482 


Hwang, Cheng- Wei 


555 


Czejdo, Bogdan D. 


236 


Itoh, Hidenori 


420 


De Beuvron, Francois 


357 


Jannach, D. 


24 


Debenham, John 


699 


Janoski, G. 


623 


Delaney, John 


380 


Jensen, Finn V. 


367 


Deng, X. 


629 


John, Ulrich 


396 


Duvallet, Claude 


711 


Kalmes, Donald 


723 


Easwaran, Aneurin M. 


119 


Kang, Soon-Ju 


34 


El Ayeb, Bechir 


192 


Karam, Orlando 


645 


Elhadef, Mourad 


192 


Karasik, Y. B. 


328 


Elrick, R. 


56 


Karri, V. 


67, 73 


Ersak, Aydin 


705 


Kato, Kiyoshi 


651 


Felfemig, A. 


24 


Kjaemlff, Uffe 


367 


Feraday, Simon 


46 


Klinkert, Mike 


230 




750 



Author Index 



Kojiri, Tomoko 


242 


Nikolopoulos, Chris 


723 


Kokol, Peter 


611 


O'Boyle, Cara 


512 


Kolluru, Ramesh 


306 


Odisho, Edwin 


617 


Koo, Jeong Seon 


204 


Ohara, Hisashi 


573 


Kose, Kuniji 


149 


Orihara, Ryohei 


639 


Kubat, Miroslav 


426, 434 


Osmani, Aomar 


463 


Kuipers, Joris 


139 


Paik, James 


567 


Kullmann, Martina 


357 


Park, Nam-Seog 


34 


Kunej, Andrej 


611 


Paz, Mario 


617 


Kuo, Pikuei 


149 


Paz, Noemi 


617 


Kurkovsky, Stanislav 


272 


Pendergraft, Curt 


250 


LaBauve, G. 


306 


Perron, J. 


671 


Lalitrojwong, Pattarachai 549 


Petrecca, Giovanni 


741 


Lee, Chin Keong 


492 


Petry, Frederick E. 


250, 260, 645, 693 


Lee, Hahn-Ming 


555 


Phan, Sieu 


149 


Leedham, Graham 


492 


Pietrzyk, M. 


623 


Leigh, William 


617 


Pimentel, Julio C. G. 


168 


Levy, Francois 


463 


Pitt, Jeremy 


4, 119 


Li, F.-S. 


290, 623 


Podgorelec, Vili 


611 


Li, Hong 


731 


Potter, W. D. 


210, 629 


Li, Hujun 


290 


Raghavan, Vijay 


220 


Lindsay, Malcolm 


46 


Ranta, M. 


93 


Liu, Mei-Ling 


651 


Rao, Nageswara S. V. 


192 


Liu, S. 


629 


Rauscher, H. M. 


629 


Loganantharaj, Rasiah 


129, 272, 306, 434 


Rider, Richard J. 


534 


Lui, Y. 


390 


Rigas, D. I. 


390 


Lutsky, Patti 


583 


Robertson, David S. 


660 


Maciel, Paula Marisa 


316 


Rocha, Miguel 


601 


Mangina, E. E. 


683 


Rodrigues, M. A. 


390 


Mantay, Thomas 


474 


Rossell, Valentina 


561 


Mantyla, M. 


93 


Sachdev, Sharad 


443 


Martin, E. A. 


56 


Scherer, Stefan 


109 


Mateis, Cristinel 


174 


Schmitz, Eber A. 


168 


McArthur, S. D. J. 


56, 160, 683 


Scott, T. M. 


93 


McClendon, R. W. 


543 


Serrano, Miguel A. 


79 


McDonald, J. 


56, 160, 683 


Setoodehnia, Ali K. 


731 


Melo, Ana Christina de 


660 


Sewisy, Adel A. 


522 


Menal, J. 


160 


Shareef, Iqbal 


723 


Miyazaki, Tsuyoshi 


420 


Shaw, Kevin 


645 


Montes de Oca, Carlos 


79 


Shin, Kihong 


46 


Moon, Jae-Chul 


34 


Silvio, H. E. 


543 


Moriarty, David E. 


453 


Simmons, Steve 


306 


Moyes, A. 


56, 160 


Skaanning, Claus 


367 


Murakami , Tomoko 


639 


Skjellum, Anthony 


85 


Nakamura, Tsuyoshi 


420 


Smirnov, Alexander V. 


345 


Nakatsu, Robbie 


414 


Smith, S. 


306 


Neves, Jose 


14, 601 


Smolinski, Brent A. 


296 


Neves Ferraz, Inhaiima 


316 


Smyth, Barry 


512 




Author Index 



751 



Sobaniec, Cezary 


236 


Varshney, Pramod K. 


1 


Somasekar, S. 


629 


Vasconcelos, W. A. 


660 


Somodevilla, Maria J. 


693 


Verwaart, Tim 


230 


Spenser, James 


306 


Vilela, Carla 


601 


Steele, J. A. 


56, 160 


Watanabe, Toyohide 


242 


Steele, Nigel 


534 


Weiss, William W. 


290 


Stergiou, Christos 


4 


Wooley, Bruce 


85 


Stumptner, Markus 


174 


Wotawa, Franz 


174 


Sung, Andrew H. 


290, 623 


Yamada, Koji 


420 


Suzuki, Daisuke 


420 


Yang, Chunsheng 


149 


Teske, M. 


210 


Yang, Tao 


651 


Thistle, H. 


210 


Yazici, Adnan 


250 


Thomas, Bushrod 


129 


Yokota, Takehiko 


639 


Thomasma, S. 


629 


Young, D. 


56 


Tollner, E. W. 


543 


Yule, I. Y. 


56 


Treur, Jan 


230 


Zanker, M. 


24 


Trudel, Andre 


443 


Zeghib, Yacine 


357 


Twardus, D. 


210 


Zhou, Ming 


567 


Twery, M. J. 


210 


Zhou, Nan 
Zizka, Jan 


482 

426 




